LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
277 stars 38 forks source link

Issue: Unexpected key(s) in state_dict: "text_model.embeddings.position_ids" #17

Open Apacchi88 opened 4 months ago

Apacchi88 commented 4 months ago

On clean comfyui install.

got prompt C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory C:\database\15_enid5 contains 16 image files 240 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1584 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "C:\database\15_enid5" image_count: 16 num_repeats: 15 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: enid5 caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 1683.99it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 240 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: C:\SD\Comfyui_Clean\ComfyUI\models\checkpoints\v1-5safe.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: Traceback (most recent call last): File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 228, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 102, in load_target_model textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype, accelerator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3917, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( ^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3860, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\model_util.py", line 1072, in load_models_from_stable_diffusion_checkpoint info = text_model.load_state_dict(converted_text_encoder_checkpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for CLIPTextModel: Unexpected key(s) in state_dict: "text_model.embeddings.position_ids". Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\SD\Comfyui_Clean\ComfyUI\venv\Scripts\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\SD\Comfyui_Clean\ComfyUI\models\checkpoints\v1-5safe.safetensors', '--train_data_dir=C:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=yoh', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=yoh', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=4', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=1', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 14.57 seconds

Apacchi88 commented 4 months ago

Okay, I reinstalled pytorch with: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

I have a RTX 4060TI

After, I re-did the pip install -r requirements_win.txt

Now I get the following error: got prompt C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory C:\database\15_enid5 contains 16 image files 240 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1584 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "C:\database\15_enid5" image_count: 16 num_repeats: 15 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: enid5 caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 1229.71it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 240 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: C:\SD\Comfyui_Clean\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|██████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<?, ?it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:01<00:00, 9.55it/s] create LoRA network. base dim (rank): 32, alpha: 32.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. False

===================================BUG REPORT=================================== C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

warn(msg)

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')} DEBUG: Possible options found for libcudart.so: set() CUDA SETUP: PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: 8.9. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md CUDA SETUP: Loading binary C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.so... argument of type 'WindowsPath' is not iterable CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected. CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA. CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO. CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local Traceback (most recent call last): File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 342, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\SD\Comfyui_Clean\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3444, in get_optimizer import bitsandbytes as bnb File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\bitsandbytes\cextension.py", line 20, in raise RuntimeError(''' RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\SD\Comfyui_Clean\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\SD\Comfyui_Clean\ComfyUI\venv\Scripts\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\SD\Comfyui_Clean\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt', '--train_data_dir=C:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=yoh', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=yoh', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=2', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 21.13 seconds

mrbeandev commented 1 month ago

so there are no fixed for these errors ? whats the use of even having this lib if now one can fix these !!

i amvery frustrated on this comifyui been on this for 1 week and tried all types of things nothing works fk !!!