LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
277 stars 38 forks source link

Problem make it work #11

Open kakachiex2 opened 5 months ago

kakachiex2 commented 5 months ago

Hi Larry I create this worflow to train Lora base in your tutorila but wen executing looks like it do nothing and finish to fast.

Her is the console log:: got prompt [rgthree] Using rgthree's optimized recursive execution. [rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute. [rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI. K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 0 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. C:\Users\rafae\AppData\Local\Programs\Python\Python311\python.exe: can't open file 'K:\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\rafae\AppData\Local\Programs\Python\Python311\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors', '--train_data_dir=C:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=Kakachiex_Niji_LoRA.', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=40', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=Kakachiex_Niji_LoRA.', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=14', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2. Train finished Prompt executed in 5.58 seconds

kakachiex2 commented 5 months ago

Here is the setting Screenshot 2024-02-03 212700

kakachiex2 commented 5 months ago

An I get this after using the normal train node

got prompt [rgthree] Using rgthree's optimized recursive execution. [] K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 0 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.1.2+cpu) Python 3.11.7 (you have 3.11.6) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details prepare tokenizer vocab.json: 100%|████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 3.11MB/s] C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:149: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in K:\ComfyUI\huggingface\hub\models--openai--clip-vit-large-patch14. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) merges.txt: 100%|████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 5.93MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████| 389/389 [00:00<?, ?B/s] tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████| 905/905 [00:00<?, ?B/s] update token length: 225 Using DreamBooth method. prepare images. 0 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: False

[Dataset 0] loading image sizes. 0it [00:00, ?it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\numpy\core\fromnumeric.py:3464: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\numpy\core_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) mean ar error (without repeats): nan No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / 画像がありません。引 数指定を確認してください(train_data_dirには画像があるフォルダではなく、画像があるフォルダの親フォルダを指定する必要があ ります) Train finished Prompt executed in 15.35 seconds

kakachiex2 commented 5 months ago

I reinstall requirement and process it again and get this::

[] K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 0 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.1.2+cpu) Python 3.11.7 (you have 3.11.6) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory C:\database\5_Niji contains 91 image files 455 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1584 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "C:\database\5_Niji" image_count: 91 num_repeats: 5 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Niji caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 2027.07it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (320, 704), count: 205 bucket 1: resolution (384, 640), count: 115 bucket 2: resolution (512, 512), count: 135 mean ar error (without repeats): 0.03732933732933733 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net Traceback (most recent call last): File "K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 236, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\rafae\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\rafae\AppData\Local\Programs\Python\Python311\python.exe', 'K:/ComfyUI/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors', '--train_data_dir=C:/database/', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=Kakachiex_Niji_Lora', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=Kakachiex_Niji_Lora', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=0', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 28.97 seconds

whmc76 commented 1 month ago

Dude, are you done with this?