Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.84k stars 304 forks source link

BLIP captioning throws an error under specific circumstances #161

Closed oscarnevarezleal closed 1 year ago

oscarnevarezleal commented 1 year ago

What happened?

BLIP captioning throws an error under specific circumstances.

I'm working on a project that relies in this one, when I run make_captions.py I encounter an error. This doesn't happen in Colab which makes me believe there must be something related with my installation and its dependances. Any help would be very much appreciated.

Details

./finetune/make_captions.py /opt/ml/input/data/train --batch_size 4 --caption_extension .caption --max_data_loader_n_workers 2 --debug 
load images from /opt/ml/input/data/train
found 6 images.
loading GIT: microsoft/git-large-textcaps
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████| 503/503 [00:00<00:00, 135kB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 453/453 [00:00<00:00, 138kB/s]
Downloading (…)solve/main/vocab.txt: 100%|███████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 75.3MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 200MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████| 125/125 [00:00<00:00, 96.6kB/s]
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████| 2.82k/2.82k [00:00<00:00, 836kB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████| 1.58G/1.58G [00:03<00:00, 414MB/s]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████| 141/141 [00:00<00:00, 42.4kB/s]
GIT loaded
  0%|                                                                                                                 | 0/2 [00:00<?, ?it/s]
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   23C    P8    23W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Python 3.9, CUDA 11.6, torch 1.13, Deps below

    - toml
    - opencv-python
    - prettytable
    - https://download.pytorch.org/whl/cu116/torch-1.13.1%2Bcu116-cp39-cp39-linux_x86_64.whl
    - https://download.pytorch.org/whl/cu116/torchvision-0.14.1%2Bcu116-cp39-cp39-linux_x86_64.whl
    - https://github.com/camenduru/stable-diffusion-webui-colab/releases/download/0.0.17/xformers-0.0.17+b6be33a.d20230315-cp39-cp39-linux_x86_64.whl
    - triton==2.0.0.dev20221120
    - wandb
    - pillow==9.1.0
    - accelerate==0.15.0
    - transformers==4.26.0
    - ftfy==6.1.1
    - albumentations==1.3.0
    - opencv-python==4.7.0.68
    - einops==0.6.0
    - diffusers[torch]==0.10.2
    - pytorch-lightning==1.9.0
    - bitsandbytes==0.35.0
    - tensorboard==2.10.1
    - safetensors==0.2.6
    - tensorflow==2.10.1
    - requests==2.28.2
    - huggingface-hub==0.12.0
    - timm==0.6.12
    - fairscale==0.4.13
    - lion-pytorch==0.0.6

Expected behavior

Captioning works as in Colab

Linaqruf commented 1 year ago

Hi, you probably accidentally using GIT and not blip. load images from /opt/ml/input/data/train found 6 images. loading GIT: microsoft/git-large-textcaps