bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

Can't start LoRA training - Traceback error and BinaryIO is not found. #643

Closed MangoDragonHub closed 8 months ago

MangoDragonHub commented 1 year ago

Hello. This is my first time trying to create a LoRA model. I'm getting this weird Traceback error and BinaryIO is not found. I'm running this on Ubuntu using a GTX 1660 TI

2023-04-20 22:41:52.639080: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-20 22:41:52.774118: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-04-20 22:41:53.196010: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-04-20 22:41:53.196054: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-04-20 22:41:53.196061: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. The following values were not passed toaccelerate launchand had defaults used instead: --num_processeswas set to a value of1 --num_machineswas set to a value of1 --mixed_precisionwas set to a value of'no' --dynamo_backendwas set to a value of'no' To avoid this warning pass in values for each of the problematic parameters or runaccelerate config`. 2023-04-20 22:41:54.966153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-20 22:41:55.113468: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-04-20 22:41:55.544393: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-04-20 22:41:55.544439: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-04-20 22:41:55.544447: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Traceback (most recent call last): File "train_network.py", line 18, in import library.train_util as train_util File "/home/rashad/Documents/kohya_ss/library/train_util.py", line 63, in import library.huggingface_util as huggingface_util File "/home/rashad/Documents/kohya_ss/library/huggingface_util.py", line 25, in src: Union[str, Path, bytes, BinaryIO], NameError: name 'BinaryIO' is not defined Traceback (most recent call last): File "/home/rashad/Documents/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/rashad/Documents/kohya_ss/venv/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/rashad/Documents/kohya_ss/venv/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/home/rashad/Documents/kohya_ss/venv/lib/python3.8/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/rashad/Documents/kohya_ss/venv/bin/python', 'train_network.py', '--pretrained_model_name_or_path=/home/rashad/stable-diffusion-webui/models/Stable-diffusion/dreamlikephotgraphy.safetensors', '--train_data_dir=/home/rashad/stable-diffusion-webui/training/gameArt/preprocessed/LORA/images', '--resolution=512,512', '--output_dir=/home/rashad/stable-diffusion-webui/training/gameArt/preprocessed/LORA/model', '--logging_dir=/home/rashad/stable-diffusion-webui/training/gameArt/preprocessed/LORA/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=CoomerDiffusionTest', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=1700', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

`

idrirap commented 1 year ago

I had the same error. The scripts needs python3.10 and ubuntu install by default python3.8 I fixed it by installing python3.10 and doing the venv manually.

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10 python3.10-pip python3.10-venv python3.10-tk

Then I removed the venv and recreated it using:

python3.10 -m venv ./venv
source venv/bin/activate

And rerunning the install script Hope it helped