Closed DivinoAG closed 1 year ago
I suspect this is an issue with the model you are using as the base for the training. If you use one of the quicksetting model like SD1.5, does it work?
I suspect this is an issue with the model you are using as the base for the training. If you use one of the quicksetting model like SD1.5, does it work?
That is exactly the one I was trying it with. I tried with my local version of SD 1.5 .ckpt as well, the one I use with Easy Diffusion and A1111 WebUI, and it seemed to go a little bit further but then it gave me a different error, CUDA detection failed
. Below is the relevant log:
CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda116.dll
CUDA SETUP: Defaulting to libbitsandbytes.so...
CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
Traceback (most recent call last):
File "C:\Kohya\kohya_ss\train_db.py", line 346, in <module>
train(args)
File "C:\Kohya\kohya_ss\train_db.py", line 122, in train
import bitsandbytes as bnb
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
from .autograd._functions import (
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\autograd\_functions.py", line 5, in <module>
import bitsandbytes.functional as F
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\functional.py", line 13, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 43, in <module>
lib = CUDALibrary_Singleton.get_instance().lib
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 39, in get_instance
cls._instance.initialize()
File "C:\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 27, in initialize
raise Exception('CUDA SETUP: Setup Failed!')
Exception: CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "C:\Users\andre\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\andre\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Kohya\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'train_db.py', '--pretrained_model_name_or_path=C:\\sd-shared-files\\models\\sd-v1-5-pruned-emaonly.ckpt', '--train_data_dir=C:/Temp/omgcsply lora/img', '--resolution=512,512', '--output_dir=C:/Temp/omgcsply lora/model', '--logging_dir=C:/Temp/omgcsply lora/log', '--save_model_as=safetensors', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=6800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.
Sound like a bad setup if it can't find the CUDA drivers. Make sur you don't already have local pip modules installed outside the venv.
Sound like a bad setup if it can't find the CUDA drivers. Make sur you don't already have local pip modules installed outside the venv.
Could you be so kind as to describe a bit more what you talking about? I'm not sure I understand exactly what you mean. I installed Kohya following the official instructions and did nothing beyond that. I have UIs for SD installed like WebUI and Easy Diffusion, but also didn't do any tinkering beyond what their respective installations require, and they clearly can run CUDA for their respective render processes. So I don't know what "local pip modules" could be affecting this, or how I would go about troubleshooting this.
It is hard to tell. I only create the GUI that allow you to use the kohya python code. This error should really be brought to kohya in his main repo as he is the one writing the code. I do my best to help but those type of errors are outside my expertise I am afraid.
I guess I didn't realize they were separate things. I'll take a look at their repo then. Thanks.
Hello, I'm getting a number of different errors when attempting to run a LoRA training session here, and I can't really pinpoint what is the cause. I hope anyone here have any insights. I'm running this on a mobile 3060.
Below is my error log.