AttentionX / InstructBLIP_PEFT

Apache License 2.0
29 stars 4 forks source link

Issue with 'BFloat16' #3

Open diav79 opened 8 months ago

diav79 commented 8 months ago

Hi,

I am facing following error: RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'

Any idea why and how to resolve this? Thanks!

diav79 commented 8 months ago

I was able to solve it by following this: https://github.com/salesforce/LAVIS/issues/91#issuecomment-1645486525 After this, I was running out of memory for batch size 16 on 24GB device, so I reduced batch size to 4 and it started training.

However, I don't know why bfloat16 won't work. My best hypothesis is requirement mismatch.

For some more context, I am using Nvidia A10G card. And here are the packages installed with their version by following requirements.txt: Successfully installed accelerate-0.27.2 altair-5.2.0 annotated-types-0.6.0 antlr4-python3-runtime-4.9.3 blis-0.7.11 braceexpand-0.1.7 cachetools-5.3.2 catalogue-2.0.10 cfgv-3.4.0 cloudpathlib-0.16.0 confection-0.1.4 contexttimer-0.3.3 cymem-2.0.8 decord-0.6.0 diffusers-0.16.0 distlib-0.3.8 einops-0.7.0 fairscale-0.4.4 ftfy-6.1.3 huggingface-hub-0.20.3 identify-2.5.35 iopath-0.1.10 kaggle-1.6.6 langcodes-3.3.0 murmurhash-1.0.10 nodeenv-1.8.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 omegaconf-2.3.0 opencv-python-4.5.5.64 opendatasets-0.1.22peft-0.8.2 portalocker-2.8.2 pre-commit-3.6.2 preshed-3.0.9 pycocoevalcap-1.2 pycocotools-2.0.7 pydantic-2.6.1 pydantic-core-2.16.2 pydeck-0.8.1b0 python-magic-0.4.27 safetensors-0.4.2 sentencepiece-0.2.0 smart-open-6.4.0 spacy-3.7.4 spacy-legacy-3.0.12 spacy-loggers-1.0.5 srsly-2.4.8 streamlit-1.31.1 thinc-8.2.3 timm-0.4.12 tokenizers-0.13.3 torch-2.2.0 torchvision-0.17.0 transformers-4.31.0 triton-2.2.0 typer-0.9.0 tzlocal-5.2 validators-0.22.0 virtualenv-20.25.1 wasabi-1.1.2 wcwidth-0.2.13 weasel-0.3.4 webdataset-0.2.86