Error during lora training

Saladefroide commented 6 months ago

Hi, I'm having a problem with this node. Here is the error message I get during training. I checked my paths and everything looks correct. I don't know where the problem could come from? Thanks for your help

got prompt [rgthree] Using rgthree's optimized recursive execution. E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils ._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils ._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\DATASET\outsmthgen4\19_images contains 19 image files 361 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1584 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "E:\DATASET\outsmthgen4\19_images" image_count: 19 num_repeats: 19 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: images caption_extension: .txt

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 791.67it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 361 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:\adrug\AI\stable-diffusion-webui\models/Stable-diffusion\1.5\artUniverse_v80.safet ensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: Traceback (most recent call last): File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", lin e 1012, in trainer.train(args) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", lin e 228, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", lin e 102, in load_target_model textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype, accelerator) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py" , line 3917, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py" , line 3860, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( File "E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\model_util.py" , line 1072, in load_models_from_stable_diffusion_checkpoint info = text_model.load_state_dict(converted_text_encoder_checkpoint) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\torch\nn\modules\module.py", line 2153, in load _state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for CLIPTextModel: Unexpected key(s) in state_dict: "text_model.embeddings.position_ids". Traceback (most recent call last): File "E:\adrug\AI\Pinokio\bin\miniconda\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\adrug\AI\Pinokio\bin\miniconda\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 1033, in < module> main() File "E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 1029, in m ain launch_command(args) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 1023, in l aunch_command simple_launcher(args) File "E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 643, in si mple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\adrug\AI\Pinokio\api\comfyui.git\app\env\Scripts\python.exe', ' custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path =E:\adrug\AI\stable-diffusion-webui\models/Stable-diffusion\1.5\artUniverse_v80.safetensors', '--train_data_dir =E:/DATASET/outsmthgen4', '--output_dir=E:\adrug\AI\stable-diffusion-webui\models\Lora', '--logging_dir=./logs', '--log_prefix=outsmithgen4', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--l earning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup _steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=outsmithgen4', '-- train_batch_size=1', '--save_every_n_epochs=50', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=3', '--ca che_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safeten sors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--cl ip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip= 2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--o ptimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer _type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=Ad amW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit' , '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--per sistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizertype=AdamW8bit', '--persistent data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loa der_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_work ers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '- -log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_wit h=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensor board']' returned non-zero exit status 1. Train finished Prompt executed in 30.56 seconds

yurayko commented 5 months ago

You need to get "library" dir from https://github.com/bmaltais/kohya_ss.git and replace "custom_nodes/Lora-Training-in-Comfy/sd-scripts/library"

And "train_network.py" too.

Saladefroide commented 5 months ago

Thanks ! unfortunately it seems there is another problem now :

got prompt [rgthree] Using rgthree's optimized recursive execution. [rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute. [rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI. E:\adrug\AI\Pinokio\api\comfyui.git\app\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Plea se use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Plea se use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( 2024-04-07 22:29:13 INFO prepare tokenizer train_util.py:3959 INFO update token length: 225 train_util.py:3976 INFO Using DreamBooth method. train_network.py:173 2024-04-07 22:29:14 INFO prepare images. train_util.py:1469 INFO 0 train images with repeating. train_util.py:1508 INFO 0 reg images. train_util.py:1511 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1516 INFO [Dataset 0] config_util.py:544 batch_size: 19 resolution: (512, 512) enable_bucket: False network_multiplier: 1.0

                INFO     [Dataset 0]                                                                                                             config_util.py:550
                INFO     loading image sizes.                                                                                                     train_util.py:794

0it [00:00, ?it/s] INFO make buckets train_util.py:800 INFO number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） train_util.py:846 E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\numpy\core\fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, E:\adrug\AI\Pinokio\api\comfyui.git\app\env\lib\site-packages\numpy\core_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) INFO mean ar error (without repeats): nan train_util.py:856 ERROR No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / train_network.py:213 画像がありません。引数指定を確認してください（train_data_dirには画像があるフォルダではなく、画像があるフォルダの親フォルダを指定する必要があります） Train finished Prompt executed in 14.56 seconds

yurayko commented 5 months ago

Check the data directory name. I followed the instructions from https://www.youtube.com/watch?v=gt_E-ye2irQ

Saladefroide commented 5 months ago

Yes! It just works now, I think my data_path and output_dir was wrong, lets go for training now ! Thanks 🎉

Skiddoh commented 1 month ago

for me it only works with base sd1.5 model but not with ponyDiffussion or SDXL. any idea on what may be the issue here? error is the same as shown here initially

LarryJane491 / Lora-Training-in-Comfy

Error during lora training #36