Closed Morgane-G43 closed 4 months ago
Same Problem here, Tryd also with newer Phyton, with and Without Nvidia files, with Torch 1 and Torch 2, runint Stable diffusion coppy the venv folder use custom models, use inbound models,... and all what i know to try but every time the same error on 3 different systems.
Same. I was able to use it until yesterday, but couldn't use today.
@Morgane-G43, for your case, based on the arguements you are putting in, I suspect that the lora that you are trying to continue training from is only 32 dim (Based on the file name, and a rough guess on the lora with a similar name found on Civitai). However, I see that you are trying to training with 128 dims as part of your arguements. This is likely causing a mismatch on the dims and is causing you this issue.
Try changing --network_dim=128 to --network_dim=32 and I think your issue should be solved.
@Morgane-G43, for your case, based on the arguements you are putting in, I suspect that the lora that you are trying to continue training from is only 32 dim (Based on the file name, and a rough guess on the lora with a similar name found on Civitai). However, I see that you are trying to training with 128 dims as part of your arguements. This is likely causing a mismatch on the dims and is causing you this issue.
Try changing --network_dim=128 to --network_dim=32 and I think your issue should be solved.
Thank You so much for taking time to answer me, I'm sorry for being this late, I was at work and haven't boot my computer since.
I had this error even with the change to 32.
[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:00<00:00, 4178.56it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 3440 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 10.10it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing
safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable memory efficient attention for U-Net import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:01<00:00, 1632.93it/s] caching latents... 0it [00:00, ?it/s] create LoRA network. base dim (rank): 128, alpha: 1.0 neuron dropout: p=0.3, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net Traceback (most recent call last): File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 1033, in <module> trainer.train(args) File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 323, in train info = network.load_weights(args.network_weights) File "C:\STABLE_DIFFUSION\kohya_ss\networks\lora.py", line 938, in load_weights info = self.load_state_dict(weights_sd, False) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LoRANetwork: size mismatch for lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc1.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc1.lora_up.weight: copying a param with shape torch.Size([3072, 32]) from checkpoint, the shape in current model is torch.Size([3072, 128]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc2.lora_down.weight: copying a param with shape torch.Size([32, 3072]) from checkpoint, the shape in current model is torch.Size([128, 3072]). size mismatch for lora_te_text_model_encoder_layers_0_mlp_fc2.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_self_attn_out_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc1.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc1.lora_up.weight: copying a param with shape torch.Size([3072, 32]) from checkpoint, the shape in current model is torch.Size([3072, 128]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc2.lora_down.weight: copying a param with shape torch.Size([32, 3072]) from checkpoint, the shape in current model is torch.Size([128, 3072]). size mismatch for lora_te_text_model_encoder_layers_1_mlp_fc2.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_k_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_k_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_v_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_v_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_q_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_q_proj.lora_up.weight: copying a param with shape torch.Size([768, 32]) from checkpoint, the shape in current model is torch.Size([768, 128]). size mismatch for lora_te_text_model_encoder_layers_2_self_attn_out_proj.lora_down.weight: copying a param with shape torch.Size([32, 768]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]). lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([1280, 128]). size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_down.weight: copying a param with shape torch.Size([32, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 1280, 1, 1]). size mismatch for lora_unet_mid_block_attentions_0_proj_out.lora_up.weight: copying a param with shape torch.Size([1280, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 128, 1, 1]). Traceback (most recent call last): File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\STABLE_DIFFUSION\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\STABLE_DIFFUSION\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--resolution=512,512', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--network_alpha=1', '--training_comment=rentry.co/ProdiAgy', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=128', '--network_weights=C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors', '--gradient_accumulation_steps=3', '--output_name=Mary-Rachel_Brosnahan', '--lr_scheduler_num_cycles=10', '--scale_weight_norms=1', '--network_dropout=0.3', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=11467', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--clip_skip=2', '--keep_tokens=1', '--caption_dropout_rate=0.5', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.05', '--adaptive_noise_scale=0.005']' returned non-zero exit status 1.
Hey, from the parameters, your network dim is still set to 128. Change the network rank/dimension to 32 in the field in the screenshot (Taken from bmaltais' old youtube video so some elements may differ) and try again.
Hello, I had tried like You said in the Network Rank (Dimension), I'm so sorry for bothering You this much.
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman contains 86 image files
found directory C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman contains 5000 image files
No caption file found for 5000 images. Training will continue without captions for these images. If class token exists, it will be used. / 5000枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学 習を続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0001.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0002.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0003.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0004.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0005.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0006.jpg... and 4995 more
1720 train images with repeating.
5000 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0]
image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman"
image_count: 86
num_repeats: 20
shuffle_caption: True
keep_tokens: 1
keep_tokens_separator:
caption_dropout_rate: 0.5
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Rachel_Brosnahan woman
caption_extension: .txt
[Subset 1 of Dataset 0]
image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman"
image_count: 5000
num_repeats: 1
shuffle_caption: True
keep_tokens: 1
keep_tokens_separator:
caption_dropout_rate: 0.5
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: True
class_tokens: woman
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:01<00:00, 1079.45it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 3440
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 8.79it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
UNet2DConditionModel: 64, 8, 768, False, False
U-Net converted to original U-Net
Enable memory efficient attention for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|█████████████████████████████████████████████████████████████████████████████| 1806/1806 [00:02<00:00, 690.01it/s]
caching latents...
0it [00:00, ?it/s]
create LoRA network. base dim (rank): 32, alpha: 1.0
neuron dropout: p=0.3, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
load network weights from C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors: <All keys matched successfully>
CrossAttnDownBlock2D False -> True
CrossAttnDownBlock2D False -> True
CrossAttnDownBlock2D False -> True
DownBlock2D False -> True
UNetMidBlock2DCrossAttn False -> True
UpBlock2D False -> True
CrossAttnUpBlock2D False -> True
CrossAttnUpBlock2D False -> True
CrossAttnUpBlock2D False -> True
prepare optimizer, data loader etc.
Traceback (most recent call last):
File "C:\STABLE_DIFFUSION\kohya_ss\library\train_util.py", line 3510, in get_optimizer
import bitsandbytes as bnb
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
from . import nn
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in <module>
from .cuda_setup.main import evaluate_cuda_setup
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in <module>
from .paths import determine_cuda_runtime_lib_path
ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 1033, in <module>
trainer.train(args)
File "C:\STABLE_DIFFUSION\kohya_ss\train_network.py", line 345, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "C:\STABLE_DIFFUSION\kohya_ss\library\train_util.py", line 3512, in get_optimizer
raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです")
ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです
Traceback (most recent call last):
File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\STABLE_DIFFUSION\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "C:\STABLE_DIFFUSION\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\STABLE_DIFFUSION\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--resolution=512,512', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--network_alpha=1', '--training_comment=rentry.co/ProdiAgy', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=32', '--network_weights=C:/Users/ofiri/Downloads/Kohya_ss/lora_split_32_v1.safetensors', '--gradient_accumulation_steps=3', '--output_name=Mary-Rachel_Brosnahan', '--lr_scheduler_num_cycles=10', '--scale_weight_norms=1', '--network_dropout=0.3', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=11467', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=31337', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW8bit', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--clip_skip=2', '--keep_tokens=1', '--caption_dropout_rate=0.5', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.05', '--adaptive_noise_scale=0.005']' returned non-zero exit status 1.
it happened to me when i tried to use adamw
also, it happened when i import json settings. not sure why this happens. but if you manually input the settings instead then the error goes away
The root cause is that 8bit opitmizers are probably a bit funky right now, since they require bitsandbytes installed in order to run properly, but my understanding is that bitsnadbytes no longer natively supports Windows and is only supposed to work properly in Linux. To get 8bit optimizers like AdamW8bit to work, you'd need to install a specific That being said, I also ran into issues in Google Colab enviroments so not too sure if that's the full story.
If you haven't yet, you can try installing bitsandbytes via the setup.bat script again then select option 3.
Alternatively, try using the Adafactor Optimizer with the following settings for fixed learning rate:
optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
learning_rate = 4e-7 # SDXL original learning rate
Thank You all, where should I put ?
optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
learning_rate = 4e-7 # SDXL original learning rate
it happened to me when i tried to use adamw
also, it happened when i import json settings. not sure why this happens. but if you manually input the settings instead then the error goes away
I will try to not used my json, thank You.
Aite, I rarely use Kohya_ss GUI these days as I'm running on colab, so here's a color coded screenshot from bmaltais' youtube video here. Since the video is from an older version, there could be some differences, but just find the respective fields to put in the parameters.
0.0000004
. 4e-7
should also workscale_parameter=False relative_step=False warmup_init=False
Hello, I'm sorry for being this late, works and stuff had make me not going on my personal computer.
I had to install everything from scratch since my Kohya was bad (error torch something).
Right now I'm starting from scratch and doing everything like you told and the error message seems shorter.
23:46:40-511051 INFO Start training LoRA Standard ...
23:46:40-512641 INFO Checking for duplicate image filenames in training data directory...
23:46:40-515648 INFO Valid image folder names found in: C:/Users/ofiri/Downloads/Kohya_ss\img
23:46:40-517445 INFO Valid image folder names found in: C:/Users/ofiri/Downloads/Kohya_ss\reg
23:46:40-518451 INFO Folder 20_Rachel_Brosnahan woman: 86 images found
23:46:40-519451 INFO Folder 20_Rachel_Brosnahan woman: 1720 steps
23:46:40-521451 INFO Folder 40_Mary04 woman: 24 images found
23:46:40-522510 INFO Folder 40_Mary04 woman: 960 steps
23:46:40-523533 WARNING Regularisation images are used... Will double the number of steps required...
23:46:40-524037 INFO Total steps: 2680
23:46:40-525041 INFO Train batch size: 1
23:46:40-526041 INFO Gradient accumulation steps: 1
23:46:40-527041 INFO Epoch: 1
23:46:40-527041 INFO Regulatization factor: 2
23:46:40-528494 INFO max_train_steps (2680 / 1 / 1 * 1 * 2) = 5360
23:46:40-529498 INFO stop_text_encoder_training = 0
23:46:40-531005 INFO lr_warmup_steps = 536
23:46:40-532009 INFO Saving training config to C:/Users/ofiri/Downloads/Kohya_ss\model\last_20240223-234640.json...
23:46:40-534014 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --bucket_no_upscale
--bucket_reso_steps=64 --cache_latents --enable_bucket --min_bucket_reso=256
--max_bucket_reso=2048 --learning_rate="4e-07"
--logging_dir="C:/Users/ofiri/Downloads/Kohya_ss\log" --lr_scheduler="constant_with_warmup"
--lr_scheduler_num_cycles="1" --lr_warmup_steps="536" --max_data_loader_n_workers="0"
--max_grad_norm="1" --resolution="512,512" --max_train_steps="5360" --mixed_precision="fp16"
--network_alpha="1" --network_dim=32 --network_module=networks.lora --optimizer_args
scale_parameter=False relative_step=False warmup_init=False --optimizer_type="Adafactor"
--output_dir="C:/Users/ofiri/Downloads/Kohya_ss\model" --output_name="last"
--pretrained_model_name_or_path="C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors"
--reg_data_dir="C:/Users/ofiri/Downloads/Kohya_ss\reg" --save_every_n_epochs="1"
--save_model_as=safetensors --save_precision="fp16" --text_encoder_lr=0.0001
--train_batch_size="1" --train_data_dir="C:/Users/ofiri/Downloads/Kohya_ss\img"
--unet_lr=0.0001 --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman contains 86 image files
No caption file found for 86 images. Training will continue without captions for these images. If class token exists, it will be used. / 86枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00043_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00069_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00123_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00173_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00186_0.jpg
C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman\00217_0.jpg... and 81 more
found directory C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman contains 24 image files
No caption file found for 24 images. Training will continue without captions for these images. If class token exists, it will be used. / 24枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\37966.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\38208.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214335.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214426.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214533.png
C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman\Capture d'écran 2024-02-20 214546.png... and 19 more
found directory C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman contains 5000 image files
No caption file found for 5000 images. Training will continue without captions for these images. If class token exists, it will be used. / 5000枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学 習を続行します。class tokenが存在する場合はそれを使います。
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0001.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0002.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0003.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0004.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0005.jpg
C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman\woman_0006.jpg... and 4995 more
2680 train images with repeating.
5000 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0]
image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\20_Rachel_Brosnahan woman"
image_count: 86
num_repeats: 20
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Rachel_Brosnahan woman
caption_extension: .caption
[Subset 1 of Dataset 0]
image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\img\40_Mary04 woman"
image_count: 24
num_repeats: 40
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Mary04 woman
caption_extension: .caption
[Subset 2 of Dataset 0]
image_dir: "C:\Users\ofiri\Downloads\Kohya_ss\reg\1_woman"
image_count: 5000
num_repeats: 1
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: True
class_tokens: woman
caption_extension: .caption
[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████| 2790/2790 [00:00<00:00, 4907.33it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (384, 512), count: 80
bucket 1: resolution (384, 576), count: 40
bucket 2: resolution (512, 448), count: 80
bucket 3: resolution (512, 512), count: 4400
bucket 4: resolution (640, 384), count: 760
mean ar error (without repeats): 0.0008410810995720387
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors
Traceback (most recent call last):
File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 1033, in <module>
trainer.train(args)
File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 229, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File "C:\Users\ofiri\kohya_ss\kohya_ss\train_network.py", line 98, in load_target_model
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
File "C:\Users\ofiri\kohya_ss\kohya_ss\library\train_util.py", line 3996, in load_target_model
text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
File "C:\Users\ofiri\kohya_ss\kohya_ss\library\train_util.py", line 3950, in _load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
File "C:\Users\ofiri\kohya_ss\kohya_ss\library\model_util.py", line 1001, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File "C:\Users\ofiri\kohya_ss\kohya_ss\library\model_util.py", line 263, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\ofiri\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "C:\Users\ofiri\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\ofiri\\kohya_ss\\kohya_ss\\venv\\Scripts\\python.exe', './train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--learning_rate=4e-07', '--logging_dir=C:/Users/ofiri/Downloads/Kohya_ss\\log', '--lr_scheduler=constant_with_warmup', '--lr_scheduler_num_cycles=1', '--lr_warmup_steps=536', '--max_data_loader_n_workers=0', '--max_grad_norm=1', '--resolution=512,512', '--max_train_steps=5360', '--mixed_precision=fp16', '--network_alpha=1', '--network_dim=32', '--network_module=networks.lora', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--optimizer_type=Adafactor', '--output_dir=C:/Users/ofiri/Downloads/Kohya_ss\\model', '--output_name=last', '--pretrained_model_name_or_path=C:/Users/ofiri/Downloads/rachelbrosnahan.safetensors', '--reg_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\reg', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--text_encoder_lr=0.0001', '--train_batch_size=1', '--train_data_dir=C:/Users/ofiri/Downloads/Kohya_ss\\img', '--unet_lr=0.0001', '--xformers']' returned non-zero exit status 1.```
Hrm. I'm not 100% sure since it's an issue with loading the model, but wild assed guess, is the model you are loading rachelbrosnahan.safetensors
an actual checkpoint model or a LoRA?
Try loading the stable diffusion 1.5 model instead for the pretrained model, and if you are trying to resume training (or train on top of an existing LoRA), you need to look for a field "LoRA network weights" and put the path to the existing LoRA. I don't see it in my screenshot, but it should exist.
Do note, based on another issue #1972, you may need to install the dev branch of the GUI in order to have it load properly.
At least test and see if the training starts if you are loading the default SD 1.5 model just to verify if the root cause is the model you are loading.
Thank You so much for taking time to answer me.
By installing the dev branch GUI and loading the default SD 1.5, it seems to work now, when using the Lora on Stable Difusion my models seems to work but the eyes is "crook" "butchered". I will try to train more pictures, At least it seems to work now, Thank You so so much!
Good luck with your training experiments!
Thank You so much kind stranger, I hope others while find all your precious helps if they had the same issues as me!
I had reinstall GIT, Python and Kohya SS multiple time, everytime I had this error, I had watch on Google, Reddit and Youtube and no way to make it work.