bmaltais / kohya_ss

Apache License 2.0
9.55k stars 1.23k forks source link

subprocess.CalledProcessError: Command '['C:\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/kohya_ss/sd-scripts/train_db.py', '--config_file', 'C:/loramake_test/model/config_dreambooth-20241011-175157.toml']' returned non-zero exit status 1. #2895

Open leesujung999 opened 1 week ago

leesujung999 commented 1 week ago

Hi guys, i tried to train the lora but i got a lot of error. I don't know how to fix. Thanks if anyone could help me. Here's the error:

17:51:57-890597 INFO Start training Dreambooth... 17:51:57-891589 INFO Validating lr scheduler arguments... 17:51:57-893589 INFO Validating optimizer arguments... 17:51:57-895710 INFO Validating C:/loramake_test/log existence and writability... SUCCESS 17:51:57-897720 INFO Validating C:/loramake_test/model existence and writability... SUCCESS 17:51:57-898717 INFO Validating runwayml/stable-diffusion-v1-5 existence... SKIPPING: huggingface.co model 17:51:57-900714 INFO Validating C:/loramake_test/image existence... SUCCESS 17:51:57-904702 INFO Folder 150_testwork: 150 repeats found 17:51:57-906312 INFO Folder 150_testwork: 10 images found 17:51:57-907346 INFO Folder 150_testwork: 10 * 150 = 1500 steps 17:51:57-909342 INFO Regulatization factor: 1 17:51:57-910340 INFO Total steps: 1500 17:51:57-911302 INFO Train batch size: 1 17:51:57-913680 INFO Gradient accumulation steps: 1 17:51:57-915378 INFO Epoch: 1 17:51:57-916378 INFO Max train steps: 1600 17:51:57-918373 INFO lr_warmup_steps = 160 17:51:57-923666 INFO Saving training config to C:/loramake_test/model\last_20241011-175157.json... 17:51:57-926168 INFO Executing command: C:\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/kohya_ss/sd-scripts/train_db.py --config_file C:/loramake_test/model/config_dreambooth-20241011-175157.toml 17:51:57-947784 INFO Command executed. 2024-10-11 17:52:30 INFO Loading settings from train_util.py:4174 C:/loramake_test/model/config_dreambooth-20241011-175157.toml... INFO C:/loramake_test/model/config_dreambooth-20241011-175157 train_util.py:4193 2024-10-11 17:52:30 INFO prepare tokenizer train_util.py:4665 2024-10-11 17:52:31 INFO update token length: 75 train_util.py:4682 2024-10-11 17:52:32 INFO prepare images. train_util.py:1815 INFO found directory C:\loramake_test\image\150_testwork contains 10 image train_util.py:1762 files INFO 1500 train images with repeating. train_util.py:1856 INFO 0 reg images. train_util.py:1859 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1864 INFO [Dataset 0] config_util.py:572 batch_size: 1 resolution: (512, 512) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir: "C:\loramake_test\image\150_testwork"
                             image_count: 10
                             num_repeats: 150
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_separator: ,
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             alpha_mask: False,
                             is_reg: False
                             class_tokens: testwork
                             caption_extension: .txt

                INFO     [Dataset 0]                                                              config_util.py:578
                INFO     loading image sizes.                                                      train_util.py:911

100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1332.58it/s] INFO make buckets train_util.py:917 WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:934 set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計 算されるため、min_bucket_resoとmax_bucket_resoは無視されます INFO number of images (including repeats) / train_util.py:963 各bucketの画像枚数(繰り返し回数を含む) INFO bucket 0: resolution (512, 512), count: 1500 train_util.py:968 INFO mean ar error (without repeats): 0.0 train_util.py:973 INFO prepare accelerator train_db.py:106 accelerator device: cuda INFO loading model for process 0/1 train_util.py:4823 INFO load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 train_util.py:4785 Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:01<00:00, 3.10it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . 2024-10-11 17:52:35 INFO UNet2DConditionModel: 64, 8, 768, False, False original_unet.py:1387 2024-10-11 17:53:04 INFO U-Net converted to original U-Net train_util.py:4810 INFO Enable memory efficient attention for U-Net train_util.py:3037 2024-10-11 17:53:06 INFO [Dataset 0] train_util.py:2323 INFO caching latents. train_util.py:1095 INFO checking cache validity... train_util.py:1105 100%|██████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] INFO caching latents... train_util.py:1144 0%| | 0/10 [00:03<?, ?it/s] Traceback (most recent call last): File "C:\kohya_ss\sd-scripts\train_db.py", line 529, in train(args) File "C:\kohya_ss\sd-scripts\train_db.py", line 149, in train train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process) File "C:\kohya_ss\sd-scripts\library\train_util.py", line 2324, in cache_latents dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process, file_suffix) File "C:\kohya_ss\sd-scripts\library\train_util.py", line 1146, in cache_latents cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.alpha_mask, subset.random_crop) File "C:\kohya_ss\sd-scripts\library\train_util.py", line 2772, in cache_batch_latents raise RuntimeError(f"NaN detected in latents: {info.absolute_path}") RuntimeError: NaN detected in latents: C:\loramake_test\image\150_testwork\wgnb (1).jpg Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in sys.exit(main()) File "C:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\kohya_ss\venv\Scripts\python.exe', 'C:/kohya_ss/sd-scripts/train_db.py', '--config_file', 'C:/loramake_test/model/config_dreambooth-20241011-175157.toml']' returned non-zero exit status 1. 17:53:14-964296 INFO Training has ended.

Pashahlis commented 1 week ago

I figured it out. Its because with old config files from before the newest version, the "Flux" model version checkmark is automatically unchecked. So just recheck it and it works again.

https://imgur.com/a/2IVGVi9

leesujung999 commented 1 week ago

Thank you so much. The program error has been resolved.

Have a good day.

Your sincerely, Sujung Lee

-----Original Message----- From: @.> To: @.>; Cc: @.>; @.>; Sent: 2024-10-14 (월) 00:24:06 (GMT+09:00) Subject: Re: [bmaltais/kohya_ss] subprocess.CalledProcessError: Command '['C:\kohya_ss\venv\Scripts\python.exe', 'C:/kohya_ss/sd-scripts/train_db.py', '--config_file', 'C:/loramake_test/model/config_dreambooth-20241011-175157.toml']' returned non-zero exit status 1. (Issue #2895)

I figured it out. Its because with old config files from before the newest version, the "Flux" model version checkmark is automatically unchecked. So just recheck it and it works again. https://imgur.com/a/2IVGVi9 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>