bmaltais / kohya_ss

Apache License 2.0
9.71k stars 1.25k forks source link

Issue when training SDXL #2954

Open Kmilo-Rawrz opened 2 weeks ago

Kmilo-Rawrz commented 2 weeks ago

This is my first time using Kohya_SS, and I've followed two guides, one from YouTube https://www.youtube.com/watch?v=ovuO8bT9Nzw and another from Civitai https://civitai.com/articles/6438/how-to-train-your-lora-sdxl-pony-with-3070rtx-8gb-of-vram-with-kohya-gui-v2415-july-2024 . In both cases, when I start the training, I get the following error:

"09:55:45-587320 INFO Start training LoRA Standard ... 09:55:45-588319 INFO Validating lr scheduler arguments... 09:55:45-589319 INFO Validating optimizer arguments... 09:55:45-590319 INFO Validating D:/009_Kohya_ss/002_Momo/log existence and writability... SUCCESS 09:55:45-590319 INFO Validating D:/009_Kohya_ss/002_Momo/model existence and writability... SUCCESS 09:55:45-591319 INFO Validating stabilityai/stable-diffusion-xl-base-1.0 existence... SUCCESS 09:55:45-592319 INFO Validating D:/009_Kohya_ss/002_Momo/img existence... SUCCESS 09:55:45-593321 INFO Folder 5_yaoyorozu_momo Girl: 5 repeats found 09:55:45-593321 INFO Folder 5_yaoyorozu_momo Girl: 20 images found 09:55:45-595831 INFO Folder 5_yaoyorozu_momo Girl: 20 5 = 100 steps 09:55:45-596451 INFO Regulatization factor: 1 09:55:45-597461 INFO Total steps: 100 09:55:45-597461 INFO Train batch size: 2 09:55:45-598462 INFO Gradient accumulation steps: 1 09:55:45-599531 INFO Epoch: 5 09:55:45-599531 INFO max_train_steps (100 / 2 / 1 5 * 1) = 250 09:55:45-600698 INFO stop_text_encoder_training = 0 09:55:45-600698 INFO lr_warmup_steps = 0 09:55:45-603357 INFO Saving training config to D:/009_Kohya_ss/002_Momo/model\yaoyorozu_momo_20241106-095545.json... 09:55:45-604421 INFO Executing command: D:\009_Kohya_ss\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 2 --num_machines 1 --num_cpu_threads_per_process 8 D:/009_Kohya_ss/kohya_ss/sd-scripts/sdxl_train_network.py --config_file D:/009_Kohya_ss/002_Momo/model/config_lora-20241106-095545.toml 09:55:45-608463 INFO Command executed. 2024-11-06 09:55:52 INFO Loading settings from train_util.py:3744 D:/009_Kohya_ss/002_Momo/model/config_lora-20241106-095545.toml... INFO D:/009_Kohya_ss/002_Momo/model/config_lora-20241106-095545 train_util.py:3763 2024-11-06 09:55:52 INFO prepare tokenizers sdxl_train_util.py:138 2024-11-06 09:55:54 INFO update token length: 150 sdxl_train_util.py:163 INFO Using DreamBooth method. train_network.py:172 Traceback (most recent call last): File "D:\009_Kohya_ss\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in trainer.train(args) File "D:\009_Kohya_ss\kohya_ss\sd-scripts\train_network.py", line 198, in train train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) File "D:\009_Kohya_ss\kohya_ss\sd-scripts\library\config_util.py", line 486, in generate_dataset_group_by_blueprint subsets = [subset_klass(asdict(subset_blueprint.params)) for subset_blueprint in dataset_blueprint.subsets] File "D:\009_Kohya_ss\kohya_ss\sd-scripts\library\config_util.py", line 486, in subsets = [subset_klass(asdict(subset_blueprint.params)) for subset_blueprint in dataset_blueprint.subsets] TypeError: DreamBoothSubset.init() got an unexpected keyword argument 'alpha_mask' Traceback (most recent call last): File "C:\Users\kmilo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\kmilo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\009_Kohya_ss\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in sys.exit(main()) File "D:\009_Kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\009_Kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\009_Kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\009_Kohya_ss\kohya_ss\venv\Scripts\python.exe', 'D:/009_Kohya_ss/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'D:/009_Kohya_ss/002_Momo/model/config_lora-20241106-095545.toml']' returned non-zero exit status 1. 09:55:55-911175 INFO Training has ended."

Im running it at a nvidia RTX4060 (notebook) and a cpu AMD Ryzen 7945HX, with 32gb DDR5...

Thanks in advance for the help :D

Daniel23stack commented 2 weeks ago

I am able to confirm this issue on a fresh install of Kohya SS in Python 3.10. the main issue seems to be when in the caching process the loop is ended early due to an error

\site-packages\PIL\ImageFile.py", line 310, in load
    raise _get_oserror(err_code, encoder=False)
OSError: unrecognized data stream contents when reading image file

specifically despite utilizing all pngs in my dataset set around the same length with buckets. I have also shortened the dataset and arrived at the same error when training on ponyXL.

azamet90 commented 2 weeks ago

i think kohya is no more working.

Daniel23stack commented 2 weeks ago

I fixed the issue I was having by switching to the flux, 3.5 dev branch. And verifying my dataset. Not sure about what exactly in that image file caused the error but I was able to resolve the issue.