Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.83k stars 300 forks source link

Training Error "CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1." #260

Closed oblivisheee closed 1 year ago

oblivisheee commented 1 year ago

Hello, i have a problem. When i trying to train LoRA, after ~30 sec from starting, training turning off with error:

load StableDiffusion checkpoint ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/kohya-trainer/train_network.py:752 in │ │ │ │ 749 │ args = parser.parse_args() │ │ 750 │ args = train_util.read_config_from_file(args, parser) │ │ 751 │ │ │ ❱ 752 │ train(args) │ │ 753 │ │ │ │ /content/kohya-trainer/train_network.py:152 in train │ │ │ │ 149 │ │ if pi == accelerator.state.local_process_index: │ │ 150 │ │ │ print(f"loading model for process {accelerator.state.local │ │ 151 │ │ │ │ │ ❱ 152 │ │ │ textencoder, vae, unet, = train_util.load_target_model( │ │ 153 │ │ │ │ args, weight_dtype, accelerator.device if args.lowram │ │ 154 │ │ │ ) │ │ 155 │ │ │ │ /content/kohya-trainer/library/train_util.py:2739 in load_target_model │ │ │ │ 2736 │ load_stable_diffusion_format = os.path.isfile(name_or_path) # de │ │ 2737 │ if load_stable_diffusion_format: │ │ 2738 │ │ print("load StableDiffusion checkpoint") │ │ ❱ 2739 │ │ text_encoder, vae, unet = model_util.load_models_fromstable │ │ 2740 │ else: │ │ 2741 │ │ # Diffusers model is loaded to CPU │ │ 2742 │ │ print("load Diffusers pretrained models") │ │ │ │ /content/kohya-trainer/library/model_util.py:857 in │ │ load_models_from_stable_diffusion_checkpoint │ │ │ │ 854 │ converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state │ │ 855 │ │ │ 856 │ unet = UNet2DConditionModel(**unet_config).to(device) │ │ ❱ 857 │ info = unet.load_state_dict(converted_unet_checkpoint) │ │ 858 │ print("loading u-net:", info) │ │ 859 │ │ │ 860 │ # Convert the VAE model. │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2041 in │ │ load_state_dict │ │ │ │ 2038 │ │ │ │ │ │ ', '.join('"{}"'.format(k) for k in missing_k │ │ 2039 │ │ │ │ 2040 │ │ if len(error_msgs) > 0: │ │ ❱ 2041 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {} │ │ 2042 │ │ │ │ │ │ │ self.class.name, "\n\t".join(e │ │ 2043 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │ │ 2044 │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.p │ │ y:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 │ │ in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == Com │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in │ │ simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1.

How i can fix that?

FarhanAnis005 commented 1 year ago

same issue

Linaqruf commented 1 year ago

what model? sdxl? then u use wrong notebook

duskfallcrew commented 1 year ago

Sample prompts arent' working even on bmaltais's gui, so if t's a sample prompt issue, don't do sample prompt

ibnmrs commented 1 year ago

maybe you wrong input path of pretrained_model_name_or_path

oblivisheee commented 1 year ago

Already fixed, i just turned off v2 model.