bmaltais / kohya_ss

Apache License 2.0
9.27k stars 1.2k forks source link

Can not run lokr training #2036

Closed DuroCuri closed 6 months ago

DuroCuri commented 6 months ago

144db84703e8ae74b1c77ef09d0deb8 I open block lr, even though I open the last time I start training scripts, it still can not work. I can use these setting before.

DuroCuri commented 6 months ago

15:42:19-451970 INFO accelerate launch --num_cpu_threads_per_process=4 "./train_network.py" --network_args "down_lr_weight=1,1,1.2,1.2,1,0.1,0.1,1,1,0.75,0.8,0.8" "mid_lr_weight=0.7" "up_lr_weight=1,1,1,1,1,1,1.2,1.2,1,1,1,0.7" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --caption_extension=".txt" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --keep_tokens="1" --learning_rate="9.4e-05" --logging_dir="C:\Users\Administrator\Desktop\train\child\lele02\log" --lr_scheduler="cosine" --lr_scheduler_num_cycles="20" --lr_scheduler_power="0.3" --lr_warmup_steps="238" --max_data_loader_n_workers="0" --max_grad_norm="1" --resolution="1024" --max_train_steps="4760" --min_snr_gamma=5 --mixed_precision="bf16" --network_alpha="40" --network_args "preset=full" "conv_dim=16" "conv_alpha=10" "rank_dropout=0" "module_dropout=0" "factor=-1" "use_cp=False" "use_scalar=False" "decompose_both=False" "rank_dropout_scale=False" "algo=lokr" "train_norm=False" --network_dim=64 --network_module=lycoris.kohya --multires_noise_iterations="8" --multires_noise_discount="0.2" --optimizer_args "weight_decay=0.05" --optimizer_type="AdamW" --output_dir="C:/Users/Administrator/Desktop/train/child/lele02/output" --output_name="lele02" --pretrained_model_name_or_path="E:/Novelai/stable-diffusion-webui/models/Stable-diffusion/real istic/pureBeautyV1_pureBeautyV1.safetensors" --reg_data_dir="C:/Users/Administrator/Desktop/train/child/lele02/reg" --save_every_n_epochs="2" --save_model_as=safetensors --save_precision="bf16" --seed="1344" --train_batch_size="2" --train_data_dir="C:/Users/Administrator/Desktop/train/child/lele02/train" --v_pred_like_loss="0.0863" --xformers A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2024-03-06 15:42:34 INFO prepare tokenizer train_util.py:3959 2024-03-06 15:42:35 INFO Using DreamBooth method. train_network.py:173 INFO prepare images. train_util.py:1469 INFO found directory train_util.py:1432 C:\Users\Administrator\Desktop\train\child\lele02\train\10_xiaotiancai girl contains 19 image files INFO found directory train_util.py:1432 C:\Users\Administrator\Desktop\train\child\lele02\train\6_xiaotiancai girl contains 8 image files INFO found directory train_util.py:1432 C:\Users\Administrator\Desktop\train\child\lele02\reg\1_girl contains 14 image files INFO 238 train images with repeating. train_util.py:1508 INFO 14 reg images. train_util.py:1511 INFO [Dataset 0] config_util.py:544 batch_size: 2 resolution: (1024, 1024) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir:
                         "C:\Users\Administrator\Desktop\train\child\lele02\train\10_xiaotiancai
                         girl"
                             image_count: 19
                             num_repeats: 10
                             shuffle_caption: False
                             keep_tokens: 1
                             keep_tokens_separator:
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: False
                             class_tokens: xiaotiancai girl
                             caption_extension: .txt

                           [Subset 1 of Dataset 0]
                             image_dir:
                         "C:\Users\Administrator\Desktop\train\child\lele02\train\6_xiaotiancai
                         girl"
                             image_count: 8
                             num_repeats: 6
                             shuffle_caption: False
                             keep_tokens: 1
                             keep_tokens_separator:
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: False
                             class_tokens: xiaotiancai girl
                             caption_extension: .txt

                           [Subset 2 of Dataset 0]
                             image_dir:
                         "C:\Users\Administrator\Desktop\train\child\lele02\reg\1_girl"
                             image_count: 14
                             num_repeats: 1
                             shuffle_caption: False
                             keep_tokens: 1
                             keep_tokens_separator:
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: True
                             class_tokens: girl
                             caption_extension: .txt

                INFO     [Dataset 0]                                                              config_util.py:550
                INFO     loading image sizes.                                                      train_util.py:794

100%|████████████████████████████████████████████████████████████████████████████████| 41/41 [00:00<00:00, 5108.17it/s] INFO make buckets train_util.py:800 WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:817 set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計 算されるため、min_bucket_resoとmax_bucket_resoは無視されます INFO number of images (including repeats) / train_util.py:846 各bucketの画像枚数(繰り返し回数を含む) INFO bucket 0: resolution (512, 768), count: 170 train_util.py:851 INFO bucket 1: resolution (768, 768), count: 72 train_util.py:851 INFO bucket 2: resolution (768, 1152), count: 30 train_util.py:851 INFO bucket 3: resolution (832, 1152), count: 68 train_util.py:851 INFO bucket 4: resolution (1024, 1024), count: 36 train_util.py:851 INFO bucket 5: resolution (1152, 768), count: 100 train_util.py:851 INFO mean ar error (without repeats): 0.025701847271969253 train_util.py:856 INFO preparing accelerator train_network.py:226 accelerator device: cuda INFO loading model for process 0/1 train_util.py:4111 INFO load StableDiffusion checkpoint: train_util.py:4066 E:/Novelai/stable-diffusion-webui/models/Stable-diffusion/realistic/pure BeautyV1_pureBeautyV1.safetensors INFO UNet2DConditionModel: 64, 8, 768, False, False original_unet.py:1387 2024-03-06 15:42:38 INFO loading u-net: model_util.py:1009 2024-03-06 15:42:39 INFO loading vae: model_util.py:1017 2024-03-06 15:42:40 INFO loading text encoder: model_util.py:1074 INFO Enable xformers for U-Net train_util.py:2529 import network module: lycoris.kohya INFO [Dataset 0] train_util.py:1948 INFO caching latents. train_util.py:915 INFO checking cache validity... train_util.py:925 100%|██████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:00<?, ?it/s] INFO caching latents... train_util.py:962 100%|██████████████████████████████████████████████████████████████████████████████████| 41/41 [00:10<00:00, 3.99it/s] E:\Novelai\kohya_ss\train_network.py:300: UserWarning: disable_conv_cp and use_cp are deprecated. Please use use_tucker instead. network = network_module.create_network( Using rank adaptation algo: lokr Apply different lora dim for conv layer Conv Dim: 16, Linear Dim: 64 Apply different alpha value for conv layer Conv alpha: 10.0, Linear alpha: 40.0 Use Dropout value: 0.0 Create LyCORIS Module create LyCORIS for Text Encoder: 72 modules. Create LyCORIS Module create LyCORIS for U-Net: 282 modules. module type table: {'LokrModule': 354} enable LyCORIS for text encoder enable LyCORIS for U-Net prepare optimizer, data loader etc. 2024-03-06 15:42:53 INFO use AdamW optimizer | {'weight_decay': 0.05} train_util.py:3819 running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 238 num reg images / 正則化画像の数: 14 num batches per epoch / 1epochのバッチ数: 238 num epochs / epoch数: 20 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 4760 steps: 0%| | 0/4760 [00:00<?, ?it/s] epoch 1/20 Traceback (most recent call last): File "E:\Novelai\kohya_ss\train_network.py", line 1058, in trainer.train(args) File "E:\Novelai\kohya_ss\train_network.py", line 814, in train noise_pred = self.call_unet( File "E:\Novelai\kohya_ss\train_network.py", line 128, in call_unet noise_pred = unet(noisy_latents, timesteps, text_conds).sample File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(args, kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in call return convert_to_fp32(self.model_forward(*args, kwargs)) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "E:\Novelai\kohya_ss\library\original_unet.py", line 1634, in forward sample = upsample_block( File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "E:\Novelai\kohya_ss\library\original_unet.py", line 1206, in forward hidden_states = resnet(hidden_states, temb) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "E:\Novelai\kohya_ss\library\original_unet.py", line 468, in forward hidden_states = self.conv1(hidden_states) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "E:\Novelai\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 325, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "E:\Novelai\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/4760 [00:00<?, ?it/s] Traceback (most recent call last): File "e:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "e:\anaconda3\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Novelai\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\Novelai\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\Novelai\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "E:\Novelai\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\Novelai\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--network_args', 'down_lr_weight=1,1,1.2,1.2,1,0.1,0.1,1,1,0.75,0.8,0.8', 'mid_lr_weight=0.7', 'up_lr_weight=1,1,1,1,1,1,1.2,1.2,1,1,1,0.7', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--keep_tokens=1', '--learning_rate=9.4e-05', '--logging_dir=C:\Users\Administrator\Desktop\train\child\lele02\log', '--lr_scheduler=cosine', '--lr_scheduler_num_cycles=20', '--lr_scheduler_power=0.3', '--lr_warmup_steps=238', '--max_data_loader_n_workers=0', '--max_grad_norm=1', '--resolution=1024', '--max_train_steps=4760', '--min_snr_gamma=5', '--mixed_precision=bf16', '--network_alpha=40', '--network_args', 'preset=full', 'conv_dim=16', 'conv_alpha=10', 'rank_dropout=0', 'module_dropout=0', 'factor=-1', 'use_cp=False', 'use_scalar=False', 'decompose_both=False', 'rank_dropout_scale=False', 'algo=lokr', 'train_norm=False', '--network_dim=64', '--network_module=lycoris.kohya', '--multires_noise_iterations=8', '--multires_noise_discount=0.2', '--optimizer_args', 'weight_decay=0.05', '--optimizer_type=AdamW', '--output_dir=C:/Users/Administrator/Desktop/train/child/lele02/output', '--output_name=lele02', '--pretrained_model_name_or_path=E:/Novelai/stable-diffusion-webui/models/Stable-diffusion/realistic/pureBeautyV1_pureBeautyV1.safetensors', '--reg_data_dir=C:/Users/Administrator/Desktop/train/child/lele02/reg', '--save_every_n_epochs=2', '--save_model_as=safetensors', '--save_precision=bf16', '--seed=1344', '--train_batch_size=2', '--train_data_dir=C:/Users/Administrator/Desktop/train/child/lele02/train', '--v_pred_like_loss=0.0863', '--xformers']' returned non-zero exit status 1.

bmaltais commented 6 months ago

Can you share your .json config file? I will try to load and use it. I tried to reproduce the issue on my system using the latest code version and all work fine...

DuroCuri commented 6 months ago

lvbu_test_20240306-132245.json This file, I reset my version, now it can runs.

bmaltais commented 6 months ago

OK, so it was a setting issue on your system. It happen.

DuroCuri commented 6 months ago

My current version is commit 62fbae6b3eedffbefd5964dd5c60c492ae905fc7 (HEAD -> master, tag: v22.6.0) Merge: bfe8b06 36b666b Author: bmaltais bernard@ducourier.com Date: Sat Jan 27 14:03:03 2024 -0500

Merge pull request #1907 from bmaltais/dev

v22.6.0

Just git pull to newest version, this bug comes again. So it happens because my system setting? It comes out with AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? I can not find anywhere to set up it.

bmaltais commented 6 months ago

This appear to be a bug with the LyCORIS code... something that would need to be investigated by the author... Not something that is configured as part of the GUI... I updated the requirements to use the latest version og LyCORIS... probably a bug introduced in it.

DuroCuri commented 6 months ago

Thank you sincerely for your invaluable help and guidance!