Error while trying to create a Lora based on realismEngineSDXL_v30VAE

I am trying to create a SDXL Lora based on an existing model (realismEngineSDXL_v30VAE), but this fails at the training step. Below are the config and the training output. Are SDXL models supported?

The model was successfully downloaded

The error

RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight",

Config

[model_arguments] v2 = true v_parameterization = true pretrained_model_name_or_path = "/content/pretrained_model/realismEngineSDXL_v30VAE.safetensors"

[additional_network_arguments] no_metadata = false unet_lr = 0.0001 text_encoder_lr = 5e-5 network_module = "networks.lora" network_dim = 32 network_alpha = 16 network_train_unet_only = false network_train_text_encoder_only = false

[optimizer_arguments] optimizer_type = "AdamW8bit" learning_rate = 0.0001 max_grad_norm = 1.0 lr_scheduler = "constant" lr_warmup_steps = 0

[dataset_arguments] cache_latents = true debug_dataset = false vae_batch_size = 1

[training_arguments] output_dir = "/content/LoRA/output" output_name = "Chantal" save_precision = "fp16" save_every_n_epochs = 2 train_batch_size = 1 max_token_length = 225 mem_eff_attn = false xformers = true max_train_epochs = 10 max_data_loader_n_workers = 8 persistent_data_loader_workers = true gradient_checkpointing = false gradient_accumulation_steps = 1 mixed_precision = "fp16" logging_dir = "/content/LoRA/logs" log_prefix = "Chantal" lowram = false

[sample_prompt_arguments] sample_every_n_epochs = 1 sample_sampler = "dpmsolver++"

[dreambooth_arguments] prior_loss_weight = 1.0

[saving_arguments] save_model_as = "safetensors"

Training

CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizer update token length: 225 Load dataset config from /content/LoRA/config/dataset_config.toml prepare images. found directory /content/LoRA/train_data contains 26 image files 260 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "/content/LoRA/train_data" image_count: 26 num_repeats: 10 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: mksks caption_extension: .txt

[Dataset 0] loading image sizes. 100% 26/26 [00:00<00:00, 3964.51it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 260 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load StableDiffusion checkpoint ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /content/kohya-trainer/train_network.py:752 in │ │ │ │ 749 │ args = parser.parse_args() │ │ 750 │ args = train_util.read_config_from_file(args, parser) │ │ 751 │ │ │ ❱ 752 │ train(args) │ │ 753 │ │ │ │ /content/kohya-trainer/train_network.py:152 in train │ │ │ │ 149 │ │ if pi == accelerator.state.local_process_index: │ │ 150 │ │ │ print(f"loading model for process {accelerator.state.local_process_index}/{a │ │ 151 │ │ │ │ │ ❱ 152 │ │ │ textencoder, vae, unet, = train_util.load_target_model( │ │ 153 │ │ │ │ args, weight_dtype, accelerator.device if args.lowram else "cpu" │ │ 154 │ │ │ ) │ │ 155 │ │ │ │ /content/kohya-trainer/library/train_util.py:2739 in load_target_model │ │ │ │ 2736 │ load_stable_diffusion_format = os.path.isfile(name_or_path) # determine SD or Diffu │ │ 2737 │ if load_stable_diffusion_format: │ │ 2738 │ │ print("load StableDiffusion checkpoint") │ │ ❱ 2739 │ │ text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoin │ │ 2740 │ else: │ │ 2741 │ │ # Diffusers model is loaded to CPU │ │ 2742 │ │ print("load Diffusers pretrained models") │ │ │ │ /content/kohya-trainer/library/model_util.py:857 in load_models_from_stable_diffusion_checkpoint │ │ │ │ 854 │ converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config) │ │ 855 │ │ │ 856 │ unet = UNet2DConditionModel(**unet_config).to(device) │ │ ❱ 857 │ info = unet.load_state_dict(converted_unet_checkpoint) │ │ 858 │ print("loading u-net:", info) │ │ 859 │ │ │ 860 │ # Convert the VAE model. │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2152 in load_state_dict │ │ │ │ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │ │ 2150 │ │ │ │ 2151 │ │ if len(error_msgs) > 0: │ │ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │ │ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │ │ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │ │ 2155 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias",

...

"mid_block.attentions.0.transformer_blocks.9.norm3.weight". size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 1024]).

...

    size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a

param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1.

Linaqruf / kohya-trainer

Error while trying to create a Lora based on realismEngineSDXL_v30VAE #331

Config

Training