BLIP captioning throws an error under specific circumstances

What happened?

BLIP captioning throws an error under specific circumstances.

I'm working on a project that relies in this one, when I run make_captions.py I encounter an error. This doesn't happen in Colab which makes me believe there must be something related with my installation and its dependances. Any help would be very much appreciated.

Details

./finetune/make_captions.py /opt/ml/input/data/train --batch_size 4 --caption_extension .caption --max_data_loader_n_workers 2 --debug

load images from /opt/ml/input/data/train
found 6 images.
loading GIT: microsoft/git-large-textcaps
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████| 503/503 [00:00<00:00, 135kB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 453/453 [00:00<00:00, 138kB/s]
Downloading (…)solve/main/vocab.txt: 100%|███████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 75.3MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 200MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████| 125/125 [00:00<00:00, 96.6kB/s]
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████| 2.82k/2.82k [00:00<00:00, 836kB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████| 1.58G/1.58G [00:03<00:00, 414MB/s]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████| 141/141 [00:00<00:00, 42.4kB/s]
GIT loaded
  0%|                                                                                                                 | 0/2 [00:00<?, ?it/s]
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   23C    P8    23W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Python 3.9, CUDA 11.6, torch 1.13, Deps below

    - toml
    - opencv-python
    - prettytable
    - https://download.pytorch.org/whl/cu116/torch-1.13.1%2Bcu116-cp39-cp39-linux_x86_64.whl
    - https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp39-cp39-linux_x86_64.whl
    - https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.17/xformers-0.0.17+b6be33a.d20230315-cp39-cp39-linux_x86_64.whl
    - triton==2.0.0.dev20221120
    - wandb
    - pillow==9.1.0
    - accelerate==0.15.0
    - transformers==4.26.0
    - ftfy==6.1.1
    - albumentations==1.3.0
    - opencv-python==4.7.0.68
    - einops==0.6.0
    - diffusers[torch]==0.10.2
    - pytorch-lightning==1.9.0
    - bitsandbytes==0.35.0
    - tensorboard==2.10.1
    - safetensors==0.2.6
    - tensorflow==2.10.1
    - requests==2.28.2
    - huggingface-hub==0.12.0
    - timm==0.6.12
    - fairscale==0.4.13
    - lion-pytorch==0.0.6

Expected behavior

Captioning works as in Colab

Linaqruf / kohya-trainer

BLIP captioning throws an error under specific circumstances #161

What happened?

Details

Expected behavior