Closed davidfunk13 closed 8 months ago
Can confirm, exact same issue. I downloaded v22.2.1 as a completely new setup today, since i wanted to test Lion, which was not working (due to older bitsandbytes version) on my current setup. Cant confirm nor do i have the time too if its just the gui or not since i dont use the non-gui version, so maybe we should crosspost this issue to the main repo (https://github.com/kohya-ss/sd-scripts).
07:29:40-427896 INFO Start Finetuning...
07:29:40-522670 INFO image_num = 16680
07:29:40-523640 INFO repeats = 16680
07:29:40-525635 INFO max_train_steps = 333600
07:29:40-526661 INFO lr_warmup_steps = 0
07:29:40-527629 INFO Saving training config to C:/Users/user/stable-diffusion
/Train/checkpoint/model\checkpoint_20231117-072940.json...
07:29:40-529624 INFO accelerate launch --num_cpu_threads_per_process=1 "./fine_tune.py" --train_text_encoder
--learning_rate_te="1e-05"
--pretrained_model_name_or_path="C:/Users/user/stable-diffusion-webui/models/Stable-diffusion
/#New folder/checkpoint-000018.safetensors" --in_json="C:/Users/user/stable-diffusion
/Train/checkpoint/config/meta_lat.json" --train_data_dir="C:/Users/user/stable-diffusion
/Train/checkpoint/img/checkpoint" --output_dir="C:/Users/user/stable-diffusion
/Train/checkpoint/model" --logging_dir="C:/Users/user/stable-diffusion /Train/checkpoint/log"
--dataset_repeats=1 --enable_bucket --resolution="512,768" --min_bucket_reso=256
--max_bucket_reso=1024 --save_model_as=safetensors --output_name="checkpoint"
--max_token_length=225 --learning_rate="1e-05" --lr_scheduler="cosine" --train_batch_size="1"
--max_train_steps="333600" --save_every_n_epochs="1" --mixed_precision="bf16"
--save_precision="bf16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk
--optimizer_type="Lion8bit" --max_data_loader_n_workers="0" --max_token_length=225
--bucket_reso_steps=8 --min_timestep=500 --max_timestep=650 --xformers --noise_offset=0.0
prepare tokenizer
update token length: 225
loading existing metadata: C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
batch_size: 1
resolution: (512, 768)
enable_bucket: True
min_bucket_reso: None
max_bucket_reso: None
bucket_reso_steps: None
bucket_no_upscale: None
[Subset 0 of Dataset 0]
image_dir: "C:/Users/user/stable-diffusion /Train/checkpoint/img/1_name"
image_count: 16680
num_repeats: 1
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
metadata_file: C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json
[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████| 16680/16680 [00:00<00:00, 4156674.63it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (256, 1024), count: 4
bucket 1: resolution (320, 1024), count: 43
bucket 2: resolution (384, 896), count: 205
bucket 3: resolution (384, 960), count: 56
bucket 4: resolution (384, 1024), count: 31
bucket 5: resolution (448, 832), count: 1649
bucket 6: resolution (512, 704), count: 5403
bucket 7: resolution (512, 768), count: 3744
bucket 8: resolution (576, 576), count: 1235
bucket 9: resolution (576, 640), count: 1628
bucket 10: resolution (640, 576), count: 684
bucket 11: resolution (704, 512), count: 844
bucket 12: resolution (768, 512), count: 391
bucket 13: resolution (832, 448), count: 760
bucket 14: resolution (896, 384), count: 3
mean ar error (without repeats): 0.0
prepare accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/user/stable-diffusion-webui/models/Stable-diffusion/#New folder/checkpoint-000018.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Disable Diffusers' xformers
Enable xformers for U-Net
[Dataset 0]
caching latents.
checking cache validity...
100%|███████████████████████████████████████████████████████████████████████| 16680/16680 [00:00<00:00, 2388480.79it/s]
caching latents...
0it [00:00, ?it/s]
enable text encoder training
prepare optimizer, data loader etc.
Traceback (most recent call last):
File "C:\Users\user\Kohya 2\library\train_util.py", line 3433, in get_optimizer
import bitsandbytes as bnb
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
from . import nn
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
from .cuda_setup.main import evaluate_cuda_setup
File "C:\Users\user\Kohya 2\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\Kohya 2\fine_tune.py", line 499, in <module>
train(args)
File "C:\Users\user\Kohya 2\fine_tune.py", line 212, in train
_, _, optimizer = train_util.get_optimizer(args, trainable_params=trainable_params)
File "C:\Users\user\Kohya 2\library\train_util.py", line 3435, in get_optimizer
raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\user\Kohya 2\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "C:\Users\user\Kohya 2\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\user\\Kohya 2\\venv\\Scripts\\python.exe', './fine_tune.py', '--train_text_encoder', '--learning_rate_te=1e-05', '--pretrained_model_name_or_path=C:/Users/user/stable-diffusion-webui/models/Stable-diffusion/#New folder/checkpoint-000018.safetensors', '--in_json=C:/Users/user/stable-diffusion /Train/checkpoint/config/meta_lat.json', '--train_data_dir=C:/Users/user/stable-diffusion /Train/checkpoint/img/1_name', '--output_dir=C:/Users/user/stable-diffusion /Train/checkpoint/model', '--logging_dir=C:/Users/user/stable-diffusion /Train/checkpoint/log', '--dataset_repeats=1', '--enable_bucket', '--resolution=512,768', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--save_model_as=safetensors', '--output_name=checkpoint', '--max_token_length=225', '--learning_rate=1e-05', '--lr_scheduler=cosine', '--train_batch_size=1', '--max_train_steps=333600', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Lion8bit', '--max_data_loader_n_workers=0', '--max_token_length=225', '--bucket_reso_steps=8', '--min_timestep=500', '--max_timestep=650', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.
i too have the same issue
02:33:37-199614 INFO Start training LoRA Standard ... 02:33:37-201621 INFO Checking for duplicate image filenames in training data directory... 02:33:37-204632 INFO Valid image folder names found in: D:/ai/train/images 02:33:37-205634 INFO Folder 15_train: 34 images found 02:33:37-207641 INFO Folder 15_train: 510 steps 02:33:37-208644 INFO Total steps: 510 02:33:37-210651 INFO Train batch size: 1 02:33:37-211660 INFO Gradient accumulation steps: 1 02:33:37-213662 INFO Epoch: 10 02:33:37-215667 INFO Regulatization factor: 1 02:33:37-216681 INFO max_train_steps (510 / 1 / 1 * 10 * 1) = 5100 02:33:37-218677 INFO stop_text_encoder_training = 0 02:33:37-219680 INFO lr_warmup_steps = 510 02:33:37-220684 INFO Saving training config to D:/ai/train/model\trained_20231122-023337.json... 02:33:37-223695 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --pretrained_model_name_or_path="C:/Users/TheAnay/Downloads/v1-5-pruned.safetensors" --train_data_dir="D:/ai/train/images" --resolution="768,768" --output_dir="D:/ai/train/model" --logging_dir="D:/ai/train/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="trained" --lr_scheduler_num_cycles="10" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="510" --train_batch_size="1" --max_train_steps="5100" --save_every_n_epochs="2" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 The following values were not passed to
accelerate launchand had defaults used instead:
--num_processeswas set to a value of
1
--num_machineswas set to a value of
1
--mixed_precisionwas set to a value of
'no'
--dynamo_backendwas set to a value of
'no' To avoid this warning pass in values for each of the problematic parameters or run
accelerate config`.
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory D:\ai\train\images\15_train contains 34 image files
No caption file found for 34 images. Training will continue without captions for these images. If class token exists, it will be used. / 34枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
D:\ai\train\images\15_train\train (1).jpg
D:\ai\train\images\15_train\train (10).jpg
D:\ai\train\images\15_train\train (11).jpg
D:\ai\train\images\15_train\train (12).jpg
D:\ai\train\images\15_train\train (13).jpg
D:\ai\train\images\15_train\train (14).jpg... and 29 more
510 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (768, 768)
enable_bucket: False
[Subset 0 of Dataset 0] image_dir: "D:\ai\train\images\15_train" image_count: 34 num_repeats: 15 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: train caption_extension: .caption
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 34/34 [00:00<00:00, 2823.89it/s]
prepare dataset
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/TheAnay/Downloads/v1-5-pruned.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\ai\kohya_ss\train_network.py", line 1012, in
Please help if u find a solution
Try installing https://github.com/jllllll/bitsandbytes-windows-webui
Try installing https://github.com/jllllll/bitsandbytes-windows-webui
nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?
Try installing https://github.com/jllllll/bitsandbytes-windows-webui
nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?
Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).
Try installing https://github.com/jllllll/bitsandbytes-windows-webui
nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?
Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).
Kohya is already installing the jllllll versions for a few months now. Saw same issue posted on reddit, they installed kohya for the first time so it seems to be an issue with v22.x.x when installed fresh rather then upgrading to the same instance, iv not tried to experiment since my 21.8.8 is working just fine and i don't have allot of free time.
Try installing https://github.com/jllllll/bitsandbytes-windows-webui
nvm what i said if you see/saw it im just dumb, but how is that any different from letting the script install it?
Not 100% sure this is the solution, but there are two versions of bitsandbytes floating around. One is the Tim Dettmers one, which doesn't work with a lot of installations. The Jllllll one is a newer build and has solved bitsandbytes problems for many people (in other applications).
Kohya is already installing the jllllll versions for a few months now. Saw same issue posted on reddit, they installed kohya for the first time so it seems to be an issue with v22.x.x when installed fresh rather then upgrading to the same instance, iv not tried to experiment since my 21.8.8 is working just fine and i don't have allot of free time.
Thank you - I didn't realize that. I think that some application or the other might interfere with an install (this happens if we've got a number of AI applications being tested or running). If I get time today, I'll try and check out the latest Kohya and see if I can replicate this.
@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui
one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes
is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.
@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui
one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes
is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.
@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.
@TheAnay - to install bitsandbytes, you'll first need to activate the venv.
@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.
@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.
@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.
@TheAnay - to install bitsandbytes, you'll first need to activate the venv.
- Go to your Kohya_ss folder
- Go to the venv folder
- Run activate.bat to activate the venv
- pip uninstall bitsandbytes
- Install bitsandbytes from the jllllll repo
- Run kohyass using the gui-user.bat file
- (you might want to run the setup.bat first to confirm all requirements are OK)
@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.
Thanks a lot bro @Aamir3d that worked
@Aamir3d @TeKett tried installing bitsandbytres from the github repo u mentioned. did not work encountering the exact same error as before. IDK if i am installinng it properly tho here's what i did i opened a cmd in the kohyass folder nd ran the install commad provided in the github repo u sent above /jllllll/bitsandbytes-windows-webui one of the error in the cmd promt when trying to train is ImportError: No bitsandbytes / bitsandbytes is it this which is causing the issue. i am not very experienced with this so i dont really understand what the issue is.
@TeKett @TheAnay Someone commented elsewhere that running Setup can resolve the bitsandbytes issues.
@TheAnay - to install bitsandbytes, you'll first need to activate the venv.
- Go to your Kohya_ss folder
- Go to the venv folder
- Run activate.bat to activate the venv
- pip uninstall bitsandbytes
- Install bitsandbytes from the jllllll repo
- Run kohyass using the gui-user.bat file
- (you might want to run the setup.bat first to confirm all requirements are OK)
@TeKett - as an update, I tried running Kohya with the latest updates today and it worked flawlessly for me.
Just made an account to express how grateful I am to you, 4 days of mess, more than 10 reinstallations, thank you very much.
HOLY SHIT I FOUND THE ISSUE AND ITS STUPID AS ALL HELL.
bitsandbytes-windows-webui is missing the module "paths", that exist in other version of bitsandbytes. Kohya before v22.x.x used to install bitsandbytes 0.35 before overwriting it by installing bitsandbytes-windows-webui. This ment that the paths module, and likely other files, remained. Now tho in v22 Kohya no longer installs bitsandbytes 0.35, so the "paths" module dont exist where it should be, and throws the error.
Turns out im the stupid one, me dumdum, shame on me, think before speak. Apparently it goes deeper.
Versions before v22 of Kohya installed bitsandbytes 0.35, which don't have this issue. Now since v22 of kohya, it installs version 0.41.1 of bitsandbytes causing the issue that the "paths" file is missing.
Simply copying over the paths file from an older version does not fix it and throws another error.
On a side note, Kohya dont install bitsandbytes-windows-webui by default, but does when you do the "install specific bitsandbytes version". Still the same issue tho on all versions that isn't 0.35.
Dug some more and found that both the bitsandbytes-windows-webui and bitsandbytes main.py looks completely different from the main i get when i install the different bitsandbytes versions via Kohya's installer or pip. The rest of the files seems fine, 0.35 only has cuda118 and 0.41.1 has cuda122, so it is getting some if not all other files correct.
I installed manually instead by extracting the wheel into the venv, now it works. So the question is, why are a handfull of people, both when upgrading or installing for the first time ever, not able to install bitsandbytes correctly using pip?
This has been posted further down the chain but has no traction or responses.
Have been getting a lot of this attempting to train.
I can get it to run with adafactor with a 0 LR warmup, but can't really get it going in any other configuration