d8ahazard / sd_dreambooth_extension

Other
1.87k stars 285 forks source link

Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors #1487

Closed TheRealDrCarbon closed 3 days ago

TheRealDrCarbon commented 2 months ago

Is there an existing issue for this?

What happened?

I encountered an error when training a model using the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint in Dreambooth. The training fails with this error:

Exception training model: 'Error no file named diffusion_pytorch_model.bin found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_NEW\working

After checking, I found that the file diffusion_pytorch_model.bin is in:

C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working\vae

It appears the file is being placed in the vae subdirectory instead of the working directory. Manually copying the file to working lets the process continue, but new errors arise later (see below).

Expected Behavior: The model should place files in the correct directories, or the system should look in the proper subdirectories.

Actual Behavior: Files are created in the wrong subdirectory, causing training to fail due to missing files.

Workaround: Manually copying the file allows partial progress but leads to further errors.

Additional Notes:

Environment:

Error After Workaround: [Include next error message if necessary.]

Steps to reproduce the problem

  1. Use the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint for model training.
  2. Start training in Dreambooth with standard settings.
  3. Observe the error: diffusion_pytorch_model.bin not found in the expected path.

Commit and libraries

-

Command Line Arguments

no

Console logs

An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
                                                                                                                       An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
Traceback (most recent call last):
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 481, in inner_loop
    unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> from C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working because the following keys are missing:
 up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.proj_out.weight, up_blocks.1.resnets.1.conv_shortcut.bias, down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias, down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias, up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias, up_blocks.0.resnets.0.conv2.weight, down_blocks.2.attentions.0.norm.bias, mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias, up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight, up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight, up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight, up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight, down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight, mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight, down_blocks.2.resnets.0.time_emb_proj.bias, up_blocks.0.resnets.2.norm2.bias, up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias, up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias, up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight, up_blocks.1.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight, down_blocks.2.resnets.0.conv1.bias, up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight, up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias, up_blocks.0.resnets.2.time_emb_proj.weight, down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight, up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight, up_blocks.0.attentions.2.proj_in.bias, down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight,
 Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
Loading unet...:  86%|█████████████████████████████████████████████████████████▍         | 6/7 [00:05<00:00,  1.07it/s]
Duration: 00:01:22
Duration: 00:01:23

Additional information

have to cut a large part of console logs regarding the lengh restriction of comment

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

mary-mark commented 1 month ago

Has anyone else encountered this issue and solved it? I am using SDXL as the base

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days