Diffusion_pytorch_model.bin` Not Found in Expected Directory During Training with Dreambooth and followup errors

TheRealDrCarbon commented 2 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

I encountered an error when training a model using the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint in Dreambooth. The training fails with this error:

Exception training model: 'Error no file named diffusion_pytorch_model.bin found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_NEW\working

After checking, I found that the file diffusion_pytorch_model.bin is in:

C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working\vae

It appears the file is being placed in the vae subdirectory instead of the working directory. Manually copying the file to working lets the process continue, but new errors arise later (see below).

Expected Behavior: The model should place files in the correct directories, or the system should look in the proper subdirectories.

Actual Behavior: Files are created in the wrong subdirectory, causing training to fail due to missing files.

Workaround: Manually copying the file allows partial progress but leads to further errors.

Additional Notes:

This issue happens across multiple checkpoint versions.
Manually copying files is only a partial solution as further errors appear.

Environment:

OS: Windows
Checkpoint: Juggernaut-XL_v9_RunDiffusionPhoto_v2
Dreambooth/Stable Diffusion Version: [Add relevant version details]

Error After Workaround: [Include next error message if necessary.]

Steps to reproduce the problem

Use the Juggernaut-XL_v9_RunDiffusionPhoto_v2 checkpoint for model training.
Start training in Dreambooth with standard settings.
Observe the error: diffusion_pytorch_model.bin not found in the expected path.

Commit and libraries

-

Command Line Arguments

no

Console logs

An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
                                                                                                                       An error occurred while trying to fetch C:\Users\stefa\stable-diffusion-webui\models\dreambooth\DonCarlosXXX_XL\working: Error no file named diffusion_pytorch_model.safetensors found in directory C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working.
Traceback (most recent call last):
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 481, in inner_loop
    unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\stefa\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 740, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'> from C:\Users\stefa\stable-diffusion-webui\models\dreambooth\NEW\working because the following keys are missing:
 up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.proj_out.weight, up_blocks.1.resnets.1.conv_shortcut.bias, down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias, down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias, up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias, up_blocks.0.resnets.0.conv2.weight, down_blocks.2.attentions.0.norm.bias, mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias, up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight, up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight, up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight, up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight, up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight, up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight, down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight, mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight, down_blocks.2.resnets.0.time_emb_proj.bias, up_blocks.0.resnets.2.norm2.bias, up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias, up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias, up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight, up_blocks.1.resnets.0.conv_shortcut.bias, up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight, down_blocks.2.resnets.0.conv1.bias, up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight, up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight, up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias, up_blocks.0.resnets.2.time_emb_proj.weight, down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight, up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight, up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight, up_blocks.0.attentions.2.proj_in.bias, down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight,
 Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
Loading unet...:  86%|█████████████████████████████████████████████████████████▍         | 6/7 [00:05<00:00,  1.07it/s]
Duration: 00:01:22
Duration: 00:01:23

Additional information

have to cut a large part of console logs regarding the lengh restriction of comment

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

mary-mark commented 1 month ago

Has anyone else encountered this issue and solved it? I am using SDXL as the base

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

d8ahazard / sd_dreambooth_extension