bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

Struggling to figure out the structure to train SDXL Loras - Directory without Repeats problem, no images being seen #2148

Closed MaestroFury closed 7 months ago

MaestroFury commented 7 months ago

Sorry for the flood of tickets lately...

So now my final problem is, I can't get Kohya to actually see the images to train. I followed all kinds of youtube tutorials, I read the EDG guide on training LORAs for SDXL, the naming structure in the EDG tutorial was simple

MoroccanDress -> IMG -> 5_MorroccanDress

image

Where 5 is the number of times to train an image. I'm trying to create a lora with 20 steps, so I imagine mine would be 20_MorroccanDress

So I created that structure, Kohya doesn't see it.

I found another thread here which says I should use __ before the folder name, so I tried that too, and still nothing..

So i'm completely lost. What is the current expectation folder naming wise for Kohya to see my images. Both training and Regulisation?

b-fission commented 7 months ago

Which of those folders did you set as "Image folder" in the gui?

Suppose you had a directory structure like /somewhere/dresslora/IMG/20_MorroccanDress and within _20MorroccanDress will be the images. With that given directory structure, the path to the IMG folder would be the one you'd set as the Image folder.

And what file formats are your images in? Are they png, jpg/jpeg, or webp?

karen-pal commented 7 months ago

What do you mean Kohya doesn't see it? post a screenshot so we get what you mean. To me the GUI shows the paths gradually (and not all of the available paths in my directory tree), so I just copy and paste the correct absolute path in the GUI

MaestroFury commented 7 months ago

Well I managed to finally get Kohya to see the images, but now it won't see the caption files, which are named exactly the same as the image files, but with a .txt

and then it just dumps this at the end and never starts.

Traceback (most recent call last): File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\sdxl_train.py", line 792, in train(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\sdxltrain.py", line 354, in train , _, optimizer = train_util.get_optimizer(args, trainable_params=params_to_optimize) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\library\train_util.py", line 3816, in get_optimizer optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\transformers\optimization.py", line 625, in init super().init(params, defaults) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\optim\optimizer.py", line 261, in init raise ValueError("optimizer got an empty parameter list") ValueError: optimizer got an empty parameter list Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\Scripts\python.exe', 'C:\Users\Administrator\Desktop\Kohya\kohya_ss/sd-scripts/sdxl_train.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--cache_latents_to_disk', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_checkpointing', '--learning_rate=0.0', '--learning_rate_te1=1e-05', '--learning_rate_te2=1e-05', '--logging_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/log', '--lr_scheduler=constant', '--lr_scheduler_num_cycles=10', '--max_data_loader_n_workers=0', '--resolution=1024,1024', '--max_train_steps=9000', '--mixed_precision=bf16', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--optimizer_type=Adafactor', '--output_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/model', '--output_name=Facial_Sheet_Mask', '--pretrained_model_name_or_path=C:/Users/Administrator/Desktop/Webui - Forge/webui/models/Stable-diffusion/realvisxlV40_v40Bakedvae.safetensors', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--train_batch_size=1', '--train_data_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/img', '--xformers']' returned non-zero exit status 1.

What an incredibly frustrating experience this has been so far. Can someone PLEASE just share their setup json file with me? Do I have some setting on I shouldn't? I'm not getting any errors, just a bunch of warnings and no clear signal what I'm doing wrong.

MaestroFury commented 7 months ago

Here is the full dump, from start to finish

`16:03:33-720094 INFO accelerate launch --num_cpu_threads_per_process=2 "C:\Users\Administrator\Desktop\Kohya\kohya_ss/sd-scripts/sdxl_train.py" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --gradient_checkpointing --learning_rate="0.0" --learning_rate_te1="1e-05" --learning_rate_te2="1e-05" --logging_dir="C:/Users/Administrator/Desktop/Facial_Sheet_Masks/log" --lr_scheduler="constant" --lr_scheduler_num_cycles="10" --max_data_loader_n_workers="0" --resolution="1024,1024" --max_train_steps="9000" --mixed_precision="bf16" --optimizer_args scale_parameter=False relative_step=False warmup_init=False --optimizer_type="Adafactor" --output_dir="C:/Users/Administrator/Desktop/Facial_Sheet_Masks/model" --output_name="Facial_Sheet_Mask" --pretrained_model_name_or_path="C:/Users/Administrator/Desktop/Webui - Forge/webui/models/Stable-diffusion/realvisxlV40_v40Bakedvae.safetensors" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="bf16" --train_batch_size="1" --train_data_dir="C:/Users/Administrator/Desktop/Facial_Sheet_Masks/img" --xformers 2024-03-24 16:03:43 INFO prepare tokenizers sdxl_train_util.py:135 INFO Using DreamBooth method. sdxl_train.py:140 INFO prepare images. train_util.py:1469 INFO found directory train_util.py:1432 C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma sks contains 45 image files WARNING No caption file found for 45 images. Training will continue without train_util.py:1459 captions for these images. If class token exists, it will be used. / 45枚の画像にキャプションファイルが見つかりませんでした。これらの画像につ いてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1466 sks\1.jpg WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1466 sks\10.jpg WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1466 sks\11.jpg WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1466 sks\12.jpg WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1466 sks\13.jpg WARNING C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_Ma train_util.py:1464 sks\14.jpg... and 40 more INFO 900 train images with repeating. train_util.py:1508 INFO 0 reg images. train_util.py:1511 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1516 INFO [Dataset 0] config_util.py:544 batch_size: 1 resolution: (1024, 1024) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir:
                         "C:\Users\Administrator\Desktop\Facial_Sheet_Masks\img\20_Facial_Sheet_M
                         asks"
                             image_count: 45
                             num_repeats: 20
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: False
                             class_tokens: Facial_Sheet_Masks
                             caption_extension: .caption

2024-03-24 16:03:44 INFO [Dataset 0] config_util.py:550 INFO loading image sizes. train_util.py:794 100%|████████████████████████████████████████████████████████████████████████████████| 45/45 [00:00<00:00, 4731.01it/s] INFO make buckets train_util.py:800 WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:817 set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計 算されるため、min_bucket_resoとmax_bucket_resoは無視されます INFO number of images (including repeats) / train_util.py:846 各bucketの画像枚数(繰り返し回数を含む) INFO bucket 0: resolution (1024, 1024), count: 900 train_util.py:851 INFO mean ar error (without repeats): 0.0 train_util.py:856 INFO prepare accelerator sdxl_train.py:197 accelerator device: cuda INFO loading model for process 0/1 sdxl_train_util.py:31 INFO load StableDiffusion checkpoint: C:/Users/Administrator/Desktop/Webui sdxl_train_util.py:71

                         Forge/webui/models/Stable-diffusion/realvisxlV40_v40Bakedvae.safetens
                         ors
                INFO     building U-Net                                                       sdxl_model_util.py:192
                INFO     loading U-Net from checkpoint                                        sdxl_model_util.py:196

2024-03-24 16:03:49 INFO U-Net: sdxl_model_util.py:202 INFO building text encoders sdxl_model_util.py:205 INFO loading text encoders from checkpoint sdxl_model_util.py:258 2024-03-24 16:03:50 INFO text encoder 1: sdxl_model_util.py:272 2024-03-24 16:03:53 INFO text encoder 2: sdxl_model_util.py:276 INFO building VAE sdxl_model_util.py:279 INFO loading VAE from checkpoint sdxl_model_util.py:284 INFO VAE: sdxl_model_util.py:287 Disable Diffusers' xformers INFO Enable xformers for U-Net train_util.py:2529 INFO [Dataset 0] train_util.py:1948 INFO caching latents. train_util.py:915 INFO checking cache validity... train_util.py:925 100%|████████████████████████████████████████████████████████████████████████████████| 45/45 [00:00<00:00, 2245.91it/s] INFO caching latents... train_util.py:962 0it [00:00, ?it/s] train unet: False, text_encoder1: False, text_encoder2: False number of models: 0 number of trainable parameters: 0 prepare optimizer, data loader etc. 2024-03-24 16:04:01 INFO use Adafactor optimizer | {'scale_parameter': False, 'relative_step': train_util.py:3779 False, 'warmup_init': False} WARNING because max_grad_norm is set, clip_grad_norm is enabled. consider set to train_util.py:3807 0 / max_grad_normが設定されているためclip_grad_normが有効になります。0に設定 して無効にしたほうがいいかもしれません WARNING constant_with_warmup will be good / train_util.py:3811 スケジューラはconstant_with_warmupが良いかもしれません Traceback (most recent call last): File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\sdxl_train.py", line 792, in train(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\sdxltrain.py", line 354, in train , _, optimizer = train_util.get_optimizer(args, trainable_params=params_to_optimize) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\sd-scripts\library\train_util.py", line 3816, in get_optimizer optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\transformers\optimization.py", line 625, in init super().init(params, defaults) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\torch\optim\optimizer.py", line 261, in init raise ValueError("optimizer got an empty parameter list") ValueError: optimizer got an empty parameter list Traceback (most recent call last): File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Administrator\Desktop\Kohya\kohya_ss\venv\Scripts\python.exe', 'C:\Users\Administrator\Desktop\Kohya\kohya_ss/sd-scripts/sdxl_train.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--cache_latents_to_disk', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_checkpointing', '--learning_rate=0.0', '--learning_rate_te1=1e-05', '--learning_rate_te2=1e-05', '--logging_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/log', '--lr_scheduler=constant', '--lr_scheduler_num_cycles=10', '--max_data_loader_n_workers=0', '--resolution=1024,1024', '--max_train_steps=9000', '--mixed_precision=bf16', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--optimizer_type=Adafactor', '--output_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/model', '--output_name=Facial_Sheet_Mask', '--pretrained_model_name_or_path=C:/Users/Administrator/Desktop/Webui - Forge/webui/models/Stable-diffusion/realvisxlV40_v40Bakedvae.safetensors', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--train_batch_size=1', '--train_data_dir=C:/Users/Administrator/Desktop/Facial_Sheet_Masks/img', '--xformers']' returned non-zero exit status 1.`

MaestroFury commented 7 months ago

all my settings as instructed by some online tutorials..

image

MaestroFury commented 7 months ago

Please ignore me, for anyone who was struggling, please try and follow this tutorial TO THE LETTER.

https://techtactician.com/how-to-train-stable-diffusion-lora-models/#step-1

I am successfully training SD1.5 right now, and I hope now that I have everything working, the leap from 1.5 to SDXL will not be too far.

I am super excited to be on my way finally. And I hope to help other struggling people reach this point!

b-fission commented 7 months ago

Well I managed to finally get Kohya to see the images, but now it won't see the caption files, which are named exactly the same as the image files, but with a .txt

In case you haven't set that one yet, there's a setting called "Caption Extension" which defaults to .caption as the file extension. You'll want to change it to .txt