returned non-zero exit status 1.

Morgane-G43 commented 6 months ago

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:00<00:00, 4957.94it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (512, 512), count: 3440
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/ofiri/Downloads/Stable Difusion/bukkake_20_training_images_2020_max_training_steps_woman_class_word.ckpt
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Enable memory efficient attention for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:00<00:00, 1991.09it/s]
caching latents...
0it [00:00, ?it/s]
create LoRA network. base dim (rank): 128, alpha: 1.0
neuron dropout: p=0.3, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
Traceback (most recent call last):
  File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 1033, in <module>
    trainer.train(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 323, in train
    info = network.load_weights(args.network_weights)
  File "C:\STABLE_DIFFUSION\kohya_ss\networks\lora.py", line 938, in load_weights
    info = self.load_state_dict(weights_sd, False)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LoRANetwork:
        size mismatch for lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_te_text_model_encoder_layers_3_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]).
        size mismatch for lora_te_text_model_encoder_layers_3_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_te_text_model_encoder_layers_3_self_attn_out_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]).
        size mismatch for lora_te_text_model_encoder_layers_3_mlp_fc1.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_te_text_model_encoder_layers_3_mlp_fc1.lora_up.weight: copying a param with shape torch.Size([3072, 32]) from checkpoint, the shape in current model is torch.Size([3072, 128]).
        size mismatch for lora_te_text_model_encoder_layers_3_mlp_fc2.lora_down.weight: copying a param with shape torch.Size([32, 3072]) from checkpoint, the shape in current model is torch.Size([128, 3072]).
        size mismatch for lora_te_text_model_encoder_layers_3_mlp_fc2.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight: copying a param with shape torch.Size([10240, 32]) from checkpoint, the shape in current model is torch.Size([10240, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_down.weight: copying a param with shape torch.Size([32, 5120]) from checkpoint, the shape in current model is torch.Size([128, 5120]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_proj_out.lora_down.weight: copying a param with shape torch.Size([32, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 1280, 1, 1]).
        size mismatch for lora_unet_down_blocks_2_attentions_0_proj_out.lora_up.weight: copying a param with shape torch.Size([1280, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 128, 1, 1]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_proj_in.lora_down.weight: copying a param with shape torch.Size([32, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 1280, 1, 1]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_proj_in.lora_up.weight: copying a param with shape torch.Size([1280, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 128, 1, 1]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight: copying a param with shape torch.Size([10240, 32]) from checkpoint, the shape in current model is torch.Size([10240, 128]).
        size mismatch for lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_down.weight: copying a param with shape torch.Size([32, 5120]) from checkpoint, the shape in current model is torch.Size([128, 5120]).
        size mismatch for lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_v.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]).
        size mismatch for lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_down.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([128, 1280]).
        size mismatch for lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]).
        size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_down.weight: copying a param with shape torch.Size([32, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 1280, 1, 1]).
        size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_up.weight: copying a param with shape torch.Size([1280, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 128, 1, 1]).
Traceback (most recent call last):
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\STABLE_DIFFUSION\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=C:/Users/ofiri/Downloads/Stable Difusion/bukkake_20_training_images_2020_max_training_steps_woman_class_word.ckpt', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--resolution=512,512', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--network_alpha=1', '--training_comment=rentry.co/ProdiAgy', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=128', '--network_weights=C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors', '--gradient_accumulation_steps=3', '--output_name=Mary-Rachel_Brosnahan', '--lr_scheduler_num_cycles=10', '--scale_weight_norms=1', '--network_dropout=0.3', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=11467', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW8bit', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--clip_skip=2', '--keep_tokens=1', '--caption_dropout_rate=0.5', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.05', '--adaptive_noise_scale=0.005']' returned non-zero exit status 1.

I had reinstall GIT, Python and Kohya SS multiple time, everytime I had this error, I had watch on Google, Reddit and Youtube and no way to make it work.

Karner-D commented 6 months ago

Same Problem here, Tryd also with newer Phyton, with and Without Nvidia files, with Torch 1 and Torch 2, runint Stable diffusion coppy the venv folder use custom models, use inbound models,... and all what i know to try but every time the same error on 3 different systems.

lemoneder commented 6 months ago

Same. I was able to use it until yesterday, but couldn't use today.

DKnight54 commented 6 months ago

@Morgane-G43, for your case, based on the arguements you are putting in, I suspect that the lora that you are trying to continue training from is only 32 dim (Based on the file name, and a rough guess on the lora with a similar name found on Civitai). However, I see that you are trying to training with 128 dims as part of your arguements. This is likely causing a mismatch on the dims and is causing you this issue.

Try changing --network_dim=128 to --network_dim=32 and I think your issue should be solved.

Morgane-G43 commented 6 months ago

@Morgane-G43, for your case, based on the arguements you are putting in, I suspect that the lora that you are trying to continue training from is only 32 dim (Based on the file name, and a rough guess on the lora with a similar name found on Civitai). However, I see that you are trying to training with 128 dims as part of your arguements. This is likely causing a mismatch on the dims and is causing you this issue.

Try changing --network_dim=128 to --network_dim=32 and I think your issue should be solved.

Thank You so much for taking time to answer me, I'm sorry for being this late, I was at work and haven't boot my computer since.

I had this error even with the change to 32. [Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:00<00:00, 4178.56it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 3440 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 10.10it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passingsafety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable memory efficient attention for U-Net import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:01<00:00, 1632.93it/s] caching latents... 0it [00:00, ?it/s] create LoRA network. base dim (rank): 128, alpha: 1.0 neuron dropout: p=0.3, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net Traceback (most recent call last): File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 1033, in <module> trainer.train(args) File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 323, in train info = network.load_weights(args.network_weights) File "C:\STABLE_DIFFUSION\kohya_ss\networks\lora.py", line 938, in load_weights info = self.load_state_dict(weights_sd, False) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LoRANetwork: size mismatch for lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc1.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc1.lora_up.weight: copying a param with shape torch.Size([3072, 32]) from checkpoint, the shape in current model is torch.Size([3072, 128]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc2.lora_down.weight: copying a param with shape torch.Size([32, 3072]) from checkpoint, the shape in current model is torch.Size([128, 3072]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc2.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_out_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc1.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc1.lora_up.weight: copying a param with shape torch.Size([3072, 32]) from checkpoint, the shape in current model is torch.Size([3072, 128]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc2.lora_down.weight: copying a param with shape torch.Size([32, 3072]) from checkpoint, the shape in current model is torch.Size([128, 3072]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc2.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]). lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]). size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_down.weight: copying a param with shape torch.Size([32, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 1280, 1, 1]). size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_up.weight: copying a param with shape torch.Size([1280, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 128, 1, 1]). Traceback (most recent call last): File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\STABLE_DIFFUSION\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\STABLE_DIFFUSION\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--resolution=512,512', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--network_alpha=1', '--training_comment=rentry.co/ProdiAgy', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=128', '--network_weights=C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors', '--gradient_accumulation_steps=3', '--output_name=Mary-Rachel_Brosnahan', '--lr_scheduler_num_cycles=10', '--scale_weight_norms=1', '--network_dropout=0.3', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=11467', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--clip_skip=2', '--keep_tokens=1', '--caption_dropout_rate=0.5', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.05', '--adaptive_noise_scale=0.005']' returned non-zero exit status 1.

DKnight54 commented 6 months ago

Hey, from the parameters, your network dim is still set to 128. Change the network rank/dimension to 32 in the field in the screenshot (Taken from bmaltais' old youtube video so some elements may differ) and try again.

Morgane-G43 commented 6 months ago

Hello, I had tried like You said in the Network Rank (Dimension), I'm so sorry for bothering You this much.


A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman contains 86 image files
found directory C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman contains 5000 image files
No caption file found for 5000 images. Training will continue without captions for these images. If class token exists, it will be used. / 5000枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学 習を続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0001.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0002.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0003.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0004.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0005.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0006.jpg... and 4995 more
1720 train images with repeating.
5000 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
  batch_size: 1
  resolution: (512, 512)
  enable_bucket: True
  network_multiplier: 1.0
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman"
    image_count: 86
    num_repeats: 20
    shuffle_caption: True
    keep_tokens: 1
    keep_tokens_separator:
    caption_dropout_rate: 0.5
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: Rachel_Brosnahan woman
    caption_extension: .txt

  [Subset 1 of Dataset 0]
    image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman"
    image_count: 5000
    num_repeats: 1
    shuffle_caption: True
    keep_tokens: 1
    keep_tokens_separator:
    caption_dropout_rate: 0.5
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: True
    class_tokens: woman
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:01<00:00, 1079.45it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (512, 512), count: 3440
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00,  8.79it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
UNet2DConditionModel: 64, 8, 768, False, False
U-Net converted to original U-Net
Enable memory efficient attention for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|█████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:02<00:00, 690.01it/s]
caching latents...
0it [00:00, ?it/s]
create LoRA network. base dim (rank): 32, alpha: 1.0
neuron dropout: p=0.3, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
load network weights from C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors: <All keys matched successfully>
CrossAttnDownBlock2D False -> True
CrossAttnDownBlock2D False -> True
CrossAttnDownBlock2D False -> True
DownBlock2D False -> True
UNetMidBlock2DCrossAttn False -> True
UpBlock2D False -> True
CrossAttnUpBlock2D False -> True
CrossAttnUpBlock2D False -> True
CrossAttnUpBlock2D False -> True
prepare optimizer, data loader etc.
Traceback (most recent call last):
  File "C:\STABLE_DIFFUSION\kohya_ss\library\train_util.py", line 3510, in get_optimizer
    import bitsandbytes as bnb
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
    from .cuda_setup.main import evaluate_cuda_setup
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
    from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 1033, in <module>
    trainer.train(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 345, in train
    optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
  File "C:\STABLE_DIFFUSION\kohya_ss\library\train_util.py", line 3512, in get_optimizer
    raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\STABLE_DIFFUSION\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--resolution=512,512', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--network_alpha=1', '--training_comment=rentry.co/ProdiAgy', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=32', '--network_weights=C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors', '--gradient_accumulation_steps=3', '--output_name=Mary-Rachel_Brosnahan', '--lr_scheduler_num_cycles=10', '--scale_weight_norms=1', '--network_dropout=0.3', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=11467', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW8bit', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--clip_skip=2', '--keep_tokens=1', '--caption_dropout_rate=0.5', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.05', '--adaptive_noise_scale=0.005']' returned non-zero exit status 1.

iammvc commented 6 months ago

it happened to me when i tried to use adamw

also, it happened when i import json settings. not sure why this happens. but if you manually input the settings instead then the error goes away

DKnight54 commented 6 months ago

The root cause is that 8bit opitmizers are probably a bit funky right now, since they require bitsandbytes installed in order to run properly, but my understanding is that bitsnadbytes no longer natively supports Windows and is only supposed to work properly in Linux. To get 8bit optimizers like AdamW8bit to work, you'd need to install a specific That being said, I also ran into issues in Google Colab enviroments so not too sure if that's the full story.

If you haven't yet, you can try installing bitsandbytes via the setup.bat script again then select option 3.

Alternatively, try using the Adafactor Optimizer with the following settings for fixed learning rate:

optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
learning_rate = 4e-7 # SDXL original learning rate

Morgane-G43 commented 6 months ago

Thank You all, where should I put ?


optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
learning_rate = 4e-7 # SDXL original learning rate

Morgane-G43 commented 6 months ago

it happened to me when i tried to use adamw

also, it happened when i import json settings. not sure why this happens. but if you manually input the settings instead then the error goes away

I will try to not used my json, thank You.

DKnight54 commented 6 months ago

Aite, I rarely use Kohya_ss GUI these days as I'm running on colab, so here's a color coded screenshot from bmaltais' youtube video here. Since the video is from an older version, there could be some differences, but just find the respective fields to put in the parameters.

learning_rate is the Cyan/Blue field, Key in 0.0000004. 4e-7 should also work
lr_scheduler is thePurple/Magenta field, Select "Constant with warmup" from the dropdown menu
lr_warmup_steps should be the Orange field, but 100% warmup doesn't seem right to me. Try playing around with the slider and maybe set it to 10%.
optimizer_type is the Yellow field, dropdown menu, select Adafactor.
optimizer_args is the Green field. Copy paste this: scale_parameter=False relative_step=False warmup_init=False

Morgane-G43 commented 6 months ago

Hello, I'm sorry for being this late, works and stuff had make me not going on my personal computer.

I had to install everything from scratch since my Kohya was bad (error torch something).

Right now I'm starting from scratch and doing everything like you told and the error message seems shorter.

23:46:40-511051 INFO     Start training LoRA Standard ...
23:46:40-512641 INFO     Checking for duplicate image filenames in training data directory...
23:46:40-515648 INFO     Valid image folder names found in: C:/Users/ofiri/Downloads/Kohya_ss\img
23:46:40-517445 INFO     Valid image folder names found in: C:/Users/ofiri/Downloads/Kohya_ss\reg
23:46:40-518451 INFO     Folder 20_Rachel_Brosnahan woman: 86 images found
23:46:40-519451 INFO     Folder 20_Rachel_Brosnahan woman: 1720 steps
23:46:40-521451 INFO     Folder 40_Mary04 woman: 24 images found
23:46:40-522510 INFO     Folder 40_Mary04 woman: 960 steps
23:46:40-523533 WARNING  Regularisation images are used... Will double the number of steps required...
23:46:40-524037 INFO     Total steps: 2680
23:46:40-525041 INFO     Train batch size: 1
23:46:40-526041 INFO     Gradient accumulation steps: 1
23:46:40-527041 INFO     Epoch: 1
23:46:40-527041 INFO     Regulatization factor: 2
23:46:40-528494 INFO     max_train_steps (2680 / 1 / 1 * 1 * 2) = 5360
23:46:40-529498 INFO     stop_text_encoder_training = 0
23:46:40-531005 INFO     lr_warmup_steps = 536
23:46:40-532009 INFO     Saving training config to C:/Users/ofiri/Downloads/Kohya_ss\model\last_20240223-234640.json...
23:46:40-534014 INFO     accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --bucket_no_upscale
                         --bucket_reso_steps=64 --cache_latents --enable_bucket --min_bucket_reso=256
                         --max_bucket_reso=2048 --learning_rate="4e-07"
                         --logging_dir="C:/Users/ofiri/Downloads/Kohya_ss\log" --lr_scheduler="constant_with_warmup"
                         --lr_scheduler_num_cycles="1" --lr_warmup_steps="536" --max_data_loader_n_workers="0"
                         --max_grad_norm="1" --resolution="512,512" --max_train_steps="5360" --mixed_precision="fp16"
                         --network_alpha="1" --network_dim=32 --network_module=networks.lora --optimizer_args
                         scale_parameter=False relative_step=False warmup_init=False --optimizer_type="Adafactor"
                         --output_dir="C:/Users/ofiri/Downloads/Kohya_ss\model" --output_name="last"
                         --pretrained_model_name_or_path="C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors"
                         --reg_data_dir="C:/Users/ofiri/Downloads/Kohya_ss\reg" --save_every_n_epochs="1"
                         --save_model_as=safetensors --save_precision="fp16" --text_encoder_lr=0.0001
                         --train_batch_size="1" --train_data_dir="C:/Users/ofiri/Downloads/Kohya_ss\img"
                         --unet_lr=0.0001 --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman contains 86 image files
No caption file found for 86 images. Training will continue without captions for these images. If class token exists, it will be used. / 86枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00043_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00069_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00123_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00173_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00186_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00217_0.jpg... and 81 more
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman contains 24 image files
No caption file found for 24 images. Training will continue without captions for these images. If class token exists, it will be used. / 24枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\37966.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\38208.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214335.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214426.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214533.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214546.png... and 19 more
found directory C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman contains 5000 image files
No caption file found for 5000 images. Training will continue without captions for these images. If class token exists, it will be used. / 5000枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学 習を続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0001.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0002.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0003.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0004.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0005.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0006.jpg... and 4995 more
2680 train images with repeating.
5000 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
  batch_size: 1
  resolution: (512, 512)
  enable_bucket: True
  network_multiplier: 1.0
  min_bucket_reso: 256
  max_bucket_reso: 2048
  bucket_reso_steps: 64
  bucket_no_upscale: True

  [Subset 0 of Dataset 0]
    image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman"
    image_count: 86
    num_repeats: 20
    shuffle_caption: False
    keep_tokens: 0
    keep_tokens_separator:
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: Rachel_Brosnahan woman
    caption_extension: .caption

  [Subset 1 of Dataset 0]
    image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman"
    image_count: 24
    num_repeats: 40
    shuffle_caption: False
    keep_tokens: 0
    keep_tokens_separator:
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: False
    class_tokens: Mary04 woman
    caption_extension: .caption

  [Subset 2 of Dataset 0]
    image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman"
    image_count: 5000
    num_repeats: 1
    shuffle_caption: False
    keep_tokens: 0
    keep_tokens_separator:
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    caption_prefix: None
    caption_suffix: None
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    is_reg: True
    class_tokens: woman
    caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████| 2790/2790 [00:00<00:00, 4907.33it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (384, 512), count: 80
bucket 1: resolution (384, 576), count: 40
bucket 2: resolution (512, 448), count: 80
bucket 3: resolution (512, 512), count: 4400
bucket 4: resolution (640, 384), count: 760
mean ar error (without repeats): 0.0008410810995720387
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors
Traceback (most recent call last):
  File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 1033, in <module>
    trainer.train(args)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 229, in train
    model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 98, in load_target_model
    text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\library\train_util.py", line 3996, in load_target_model
    text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
  File "C:\Users\ofiri\kohya_ss\kohya_ss\library\train_util.py", line 3950, in _load_target_model
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
  File "C:\Users\ofiri\kohya_ss\kohya_ss\library\model_util.py", line 1001, in load_models_from_stable_diffusion_checkpoint
    converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\library\model_util.py", line 263, in convert_ldm_unet_checkpoint
    new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\ofiri\\kohya_ss\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--learning_rate=4e-07', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--lr_scheduler=constant_with_warmup', '--lr_scheduler_num_cycles=1', '--lr_warmup_steps=536', '--max_data_loader_n_workers=0', '--max_grad_norm=1', '--resolution=512,512', '--max_train_steps=5360', '--mixed_precision=fp16', '--network_alpha=1', '--network_dim=32', '--network_module=networks.lora', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--optimizer_type=Adafactor', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--output_name=last', '--pretrained_model_name_or_path=C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--text_encoder_lr=0.0001', '--train_batch_size=1', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--unet_lr=0.0001', '--xformers']' returned non-zero exit status 1.```

DKnight54 commented 6 months ago

Hrm. I'm not 100% sure since it's an issue with loading the model, but wild assed guess, is the model you are loading rachelbrosnahan.safetensors an actual checkpoint model or a LoRA? Try loading the stable diffusion 1.5 model instead for the pretrained model, and if you are trying to resume training (or train on top of an existing LoRA), you need to look for a field "LoRA network weights" and put the path to the existing LoRA. I don't see it in my screenshot, but it should exist.

Do note, based on another issue #1972, you may need to install the dev branch of the GUI in order to have it load properly.

DKnight54 commented 6 months ago

At least test and see if the training starts if you are loading the default SD 1.5 model just to verify if the root cause is the model you are loading.

Morgane-G43 commented 6 months ago

Thank You so much for taking time to answer me.

By installing the dev branch GUI and loading the default SD 1.5, it seems to work now, when using the Lora on Stable Difusion my models seems to work but the eyes is "crook" "butchered". I will try to train more pictures, At least it seems to work now, Thank You so so much!

DKnight54 commented 6 months ago

Good luck with your training experiments!

Morgane-G43 commented 6 months ago

Thank You so much kind stranger, I hope others while find all your precious helps if they had the same issues as me!

bmaltais / kohya_ss

returned non-zero exit status 1. #1965