bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

ERROR: Trying to train LoRA or preprocess images using BLIP. #1075

Closed Dynamicaaa closed 8 months ago

Dynamicaaa commented 1 year ago

Error Reasoning

Whenever I try to train a LoRA or preprocess images using BLIP on kohya_ss, I get an error saying something like AttributeError: module 'tensorflow' has no attribute 'io', I tried fixing it but nothing is working whatsoever. It installs perfectly fine, one of the things unexpected is that it's using a CPU-only Torch when in the code it states that AMD rOCM is supported (I have an AMD GPU). If you could help me out and tell me what's wrong that'll mean a lot, thank you so much!

Traceback

19:18:03-250916 INFO     Version: None
19:18:03-278112 INFO     Using CPU-only Torch
19:18:05-944069 INFO     Torch 2.0.1+cpu
19:18:05-946069 WARNING  Torch reports CUDA not available
19:18:05-947067 INFO     Verifying modules instalation status from requirements_windows_torch2.txt...
19:18:05-949066 WARNING  Package wrong version: xformers 0.0.14.dev0 required 0.0.20
19:18:05-951065 INFO     Installing package: xformers==0.0.20 bitsandbytes==0.35.0
19:18:14-625152 WARNING  Package wrong version: accelerate 0.15.0 required 0.19.0
19:18:14-627149 INFO     Installing package: accelerate==0.19.0 tensorboard==2.12.3 tensorflow==2.12.0
19:20:34-566654 INFO     Verifying modules instalation status from requirements.txt...
19:20:38-861524 INFO     headless: False
19:20:38-870334 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
19:20:44-346649 INFO     Captioning files in C:/Users/Canary/Downloads/images...
19:20:44-352184 INFO     ./venv/Scripts/python.exe "finetune/make_captions.py" --batch_size="1" --num_beams="1"
                         --top_p="0.9" --max_length="75" --min_length="5" --beam_search --caption_extension=".txt"
                         "C:/Users/Canary/Downloads/images"
                         --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/mode
                         l_large_caption.pth"
C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: [WinError 127] The specified procedure could not be found
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\make_captions.py", line 16, in <module>
    from blip.blip import blip_decoder
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\blip\blip.py", line 14, in <module>
    from blip.med import BertConfig, BertModel, BertLMHeadModel
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\blip\med.py", line 39, in <module>
    from transformers.modeling_utils import (
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 83, in <module>
    from accelerate import __version__ as accelerate_version
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 39, in <module>
    from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\tracking.py", line 42, in <module>
    from torch.utils import tensorboard
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\__init__.py", line 12, in <module>
    from .writer import FileWriter, SummaryWriter  # noqa: F401
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\writer.py", line 16, in <module>
    from ._embedding import (
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\_embedding.py", line 9, in <module>
    _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join")
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\tensorboard\lazy.py", line 65, in __getattr__
    return getattr(load_once(self), attr_name)
AttributeError: module 'tensorflow' has no attribute 'io'
19:20:50-772013 INFO     ...captioning done

Computer Specifications

Operating System: Windows 11 22H2 (x64) CPU: Intel Core i7 4790 PC Memory: 16GB DDR3 GPU: AMD Radeon RX 580 2048SP GPU Memory: 8GB VRAM

bmaltais commented 1 year ago

This is strange. A few people have weird issues with BLIP but I can't reproduce the issue myself... I hope to find the reason sooner than later.

Dynamicaaa commented 1 year ago

It’s usually reproduced by just installing kohya_ss like said by cloning the git repo and then going through the setup.bat, fp16 selected usually with every other option as no/all. Launch up kohya_ss and boom, an error.

ruiboy1919 commented 1 year ago

Based on the contents of the logs, we can infer that there may be a problem with the installation or execution of Torch and related packages

Let's select torch1 and reinstall it.

bmaltais commented 1 year ago

I tried deleting the full kohya_ss, re-install and BLIP caption run without a hitch, Maybe one of the files in your cache has been corrupted and is the cause?

Dynamicaaa commented 1 year ago

Intro

Let me give you some logs from both the installation and when started up trying to use BLIP Captioning or anything else all in general. And some explaining on what I've tried to do.

How To Replicate Error

**1. Install kohya_ss with the options inside of the Installation Traceback logs.

  1. Start up kohya_ss and try to do one little thing.
  2. The error should pop up after 3 seconds.**

Tried Fixes

I've tried downgrading the protobuf package along with other packages that needed to be downgraded alongside it which led to the same error I've had before when first using kohya_ss. Which you will see in the BLIP Captioning traceback logs. It's strange on how not even the creator can't replicate this, but it is a strange output.

Installation Traceback

Kohya_ss GUI setup menu:

1. Install kohya_ss gui
4. (Optional) Install cudann files
5. (Optional) Install bitsandbytes-windows
6. (Optional) Manually configure accelerate
7. (Optional) Start Kohya_ss GUI in browser
8. Quit

Enter your choice: 1

1. Torch 1
2. Torch 2
3. Cancel

Enter your choice: 1

18:10:55-385290 INFO     Version: None
18:10:55-390287 INFO     Python 3.10.9 on Windows
18:10:55-412274 INFO     Using CPU-only Torch
18:10:55-415272 INFO     Installing modules from requirements_windows_torch1.txt...
18:10:55-423275 INFO     Installing package: torch==1.12.1+cu116 torchvision==0.13.1+cu116 --index-url
                         https://download.pytorch.org/whl/cu116
18:17:05-065579 INFO     Installing package:
                         https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.de
                         v0-cp310-cp310-win_amd64.whl -U -I
18:17:40-385696 INFO     Installing package: bitsandbytes==0.35.0
18:17:43-262010 INFO     Installing package: accelerate==0.15.0 tensorboard==2.10.1 tensorflow==2.10.1
18:19:29-181381 INFO     Installing modules from requirements.txt...
18:19:29-190375 INFO     Installing package: albumentations==1.3.0
18:20:07-318828 INFO     Installing package: altair==4.2.2
18:20:32-522916 INFO     Installing package: dadaptation==3.1
18:20:34-569715 INFO     Installing package: diffusers[torch]==0.10.2
18:20:42-030295 INFO     Installing package: easygui==0.98.3
18:20:44-236989 INFO     Installing package: einops==0.6.0
18:20:46-290774 INFO     Installing package: fairscale==0.4.13
18:20:49-569835 INFO     Installing package: ftfy==6.1.1
18:20:51-898459 INFO     Installing package: gradio==3.33.1
18:21:15-780338 INFO     Installing package: huggingface-hub>=0.13.3
18:21:18-617659 INFO     Installing package: lion-pytorch==0.0.6
18:21:20-787377 INFO     Installing package: lycoris_lora==0.1.6
18:22:05-628863 INFO     Installing package: opencv-python==4.7.0.68
18:22:09-616525 INFO     Installing package: prodigyopt==1.0
18:22:12-345891 INFO     Installing package: pytorch-lightning==1.9.0
18:22:21-059739 INFO     Installing package: rich==13.4.1
18:22:25-006406 INFO     Installing package: safetensors==0.3.1
18:22:27-796756 INFO     Installing package: timm==0.6.12
18:22:33-013691 INFO     Installing package: tk==0.1.0
18:22:35-799024 INFO     Installing package: toml==0.10.2
18:22:38-648340 INFO     Installing package: transformers==4.26.0
18:23:05-464492 INFO     Installing package: voluptuous==0.13.1
18:23:08-378755 INFO     Installing package: wandb==0.15.0
18:23:22-006694 INFO     Installing package: -e .
18:23:31-077329 INFO     Copying bitsandbytes files...
18:23:31-167276 INFO     Configuring accelerate...
------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]:no
Do you wish to optimize your script with torch dynamo?[yes/NO]:no
Do you want to use DeepSpeed? [yes/NO]: no
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?
fp16
accelerate configuration saved at C:\Users\Canary/.cache\huggingface\accelerate\default_config.yaml

Kohya_ss GUI setup menu:

1. Install kohya_ss gui
2. (Optional) Install cudann files
3. (Optional) Install bitsandbytes-windows
4. (Optional) Manually configure accelerate
5. (Optional) Start Kohya_ss GUI in browser
6. Quit

Enter your choice: 5

Kohya_ss GUI setup menu:

1. Install kohya_ss gui
2. (Optional) Install cudann files
3. (Optional) Install bitsandbytes-windows
4. (Optional) Manually configure accelerate
5. (Optional) Start Kohya_ss GUI in browser
6. Quit

Enter your choice:

BLIP Captioning Traceback (When protobuf Is Not Downgraded)

18:47:49-453136 INFO     Version: None
18:47:49-477207 INFO     Using CPU-only Torch
18:47:54-861845 INFO     Torch 1.12.1+cu116
18:47:54-863830 WARNING  Torch reports CUDA not available
18:47:54-864844 INFO     Verifying modules instalation status from requirements_windows_torch2.txt...
18:47:54-871414 WARNING  Package wrong version: xformers 0.0.14.dev0 required 0.0.20
18:47:54-873426 INFO     Installing package: xformers==0.0.20 bitsandbytes==0.35.0
18:49:26-788521 ERROR    Error running pip: install --upgrade xformers==0.0.20 bitsandbytes==0.35.0
18:49:26-790519 WARNING  Package wrong version: accelerate 0.15.0 required 0.19.0
18:49:26-792519 INFO     Installing package: accelerate==0.19.0 tensorboard==2.12.3 tensorflow==2.12.0
18:50:18-142513 ERROR    Error running pip: install --upgrade accelerate==0.19.0 tensorboard==2.12.3 tensorflow==2.12.0
18:50:18-144511 INFO     Verifying modules instalation status from requirements.txt...
18:50:24-545707 INFO     headless: False
18:50:24-554721 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
18:51:02-967209 INFO     Captioning files in C:/Users/Canary/Downloads/images...
18:51:02-969208 INFO     ./venv/Scripts/python.exe "finetune/make_captions.py" --batch_size="1" --num_beams="1"
                         --top_p="0.9" --max_length="75" --min_length="5" --beam_search --caption_extension=".txt"
                         "C:/Users/Canary/Downloads/images"
                         --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/mode
                         l_large_caption.pth"
C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: [WinError 127] The specified procedure could not be found
  warn(f"Failed to load image Python extension: {e}")
2023-06-26 18:51:08.365673: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-06-26 18:51:08.367111: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "C:\Users\Canary\Downloads\kohya_ss\finetune\make_captions.py", line 16, in <module>
    from blip.blip import blip_decoder
  File "C:\Users\Canary\Downloads\kohya_ss\finetune\blip\blip.py", line 14, in <module>
    from blip.med import BertConfig, BertModel, BertLMHeadModel
  File "C:\Users\Canary\Downloads\kohya_ss\finetune\blip\med.py", line 24, in <module>
    from transformers.activations import ACT2FN
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\transformers\__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\transformers\utils\__init__.py", line 34, in <module>
    from .generic import (
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\transformers\utils\generic.py", line 33, in <module>
    import tensorflow as tf
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\python\__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\python\eager\context.py", line 29, in <module>
    from tensorflow.core.framework import function_pb2
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\core\framework\function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\core\framework\attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\core\framework\tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\core\framework\resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\tensorflow\core\framework\tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "C:\Users\Canary\Downloads\kohya_ss\venv\lib\site-packages\google\protobuf\descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
18:51:09-067618 INFO     ...captioning done

BLIP Captioning Traceback (When protobuf Is Downgraded)

9:18:03-250916 INFO     Version: None
19:18:03-278112 INFO     Using CPU-only Torch
19:18:05-944069 INFO     Torch 2.0.1+cpu
19:18:05-946069 WARNING  Torch reports CUDA not available
19:18:05-947067 INFO     Verifying modules instalation status from requirements_windows_torch2.txt...
19:18:05-949066 WARNING  Package wrong version: xformers 0.0.14.dev0 required 0.0.20
19:18:05-951065 INFO     Installing package: xformers==0.0.20 bitsandbytes==0.35.0
19:18:14-625152 WARNING  Package wrong version: accelerate 0.15.0 required 0.19.0
19:18:14-627149 INFO     Installing package: accelerate==0.19.0 tensorboard==2.12.3 tensorflow==2.12.0
19:20:34-566654 INFO     Verifying modules instalation status from requirements.txt...
19:20:38-861524 INFO     headless: False
19:20:38-870334 INFO     Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
19:20:44-346649 INFO     Captioning files in C:/Users/Canary/Downloads/images...
19:20:44-352184 INFO     ./venv/Scripts/python.exe "finetune/make_captions.py" --batch_size="1" --num_beams="1"
                         --top_p="0.9" --max_length="75" --min_length="5" --beam_search --caption_extension=".txt"
                         "C:/Users/Canary/Downloads/images"
                         --caption_weights="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/mode
                         l_large_caption.pth"
C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: [WinError 127] The specified procedure could not be found
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\make_captions.py", line 16, in <module>
    from blip.blip import blip_decoder
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\blip\blip.py", line 14, in <module>
    from blip.med import BertConfig, BertModel, BertLMHeadModel
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\finetune\blip\med.py", line 39, in <module>
    from transformers.modeling_utils import (
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 83, in <module>
    from accelerate import __version__ as accelerate_version
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 39, in <module>
    from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\accelerate\tracking.py", line 42, in <module>
    from torch.utils import tensorboard
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\__init__.py", line 12, in <module>
    from .writer import FileWriter, SummaryWriter  # noqa: F401
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\writer.py", line 16, in <module>
    from ._embedding import (
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\_embedding.py", line 9, in <module>
    _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join")
  File "C:\Users\Canary\Documents\SD 2.0\kohya_ss\venv\lib\site-packages\tensorboard\lazy.py", line 65, in __getattr__
    return getattr(load_once(self), attr_name)
AttributeError: module 'tensorflow' has no attribute 'io'
19:20:50-772013 INFO     ...captioning done

Computer Specifications

Operating System: Windows 11 22H2 (x64) CPU: Intel Core i7 4790 PC Memory: 16GB DDR3 GPU: AMD Radeon RX 580 2048SP GPU Memory: 8GB VRAM

QuessParaya commented 1 year ago

I'm experiencing what @glowberri experienced as well, albeit on the python version 3.10.11.