bmaltais / kohya_ss

Apache License 2.0
9.68k stars 1.25k forks source link

Got 2 errors when training SDXL including a non-zero exit status 1 #2850

Open skelescene opened 1 month ago

skelescene commented 1 month ago

Hi there! I'm trying to train an SDXL lora with additional parameters, and I keep getting 2 errors:

image Traceback (most recent call last): File "C:\Users\clara\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in trainer.train(args) File "C:\Users\clara\kohya_ss\sd-scripts\train_network.py", line 198, in train train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) File "C:\Users\clara\kohya_ss\sd-scripts\library\config_util.py", line 579, in generate_dataset_group_by_blueprint dataset.make_buckets() File "C:\Users\clara\kohya_ss\sd-scripts\library\train_util.py", line 941, in make_buckets image_info.bucket_reso, image_info.resized_size, ar_error = self.bucket_manager.select_bucket( File "C:\Users\clara\kohya_ss\sd-scripts\library\train_util.py", line 295, in select_bucket ar_error = (reso[0] / reso[1]) - aspect_ratio ZeroDivisionError: division by zero Traceback (most recent call last): File "C:\Users\clara\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\clara\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\clara\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in File "C:\Users\clara\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\clara\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\Users\clara\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\clara\kohya_ss\venv\Scripts\python.exe', 'C:/Users/clara/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'C:/Users/clara/Downloads/scoutLORA 1/config_lora-20240923-172336.toml', '--optimizer_args', 'weight_decay=0.01', 'd_coef=1', 'use_bias_correction=True', 'safeguard_warmup=False', 'betas=0.9,0.99']' returned non-zero exit status 1.

I'm very confused why this is happening, as I am using a config from a friend and it worked fine for them.

b-fission commented 1 month ago

Something about this error makes it seem like one of your images isn't loading correctly. How many images you using, and what happens if you start training with only a few of them?

ar_error = (reso[0] / reso[1]) - aspect_ratio
ZeroDivisionError: division by zero