black-forest-labs / flux

Official inference repo for FLUX.1 models
Apache License 2.0
16.17k stars 1.17k forks source link

Mismatch between model and checkpoint #187

Open Kaiwen-Zhu opened 2 weeks ago

Kaiwen-Zhu commented 2 weeks ago

I tried to load the VAE, only to find many missing and unexpected parameter keys. I notice that there seems to be a one-to-one correspondence between the missing and unexpected keys (e.g., unexpected encoder.down_blocks.0.downsamplers.0.conv.bias corresponds to missing encoder.down.0.downsample.conv.bias). However, when I map the keys manually, there is still mismatch for parameter sizes. The problem exists for both FLUX.1 [schnell] and FLUX.1 [dev]. Is this due to my improper use or version issues, or other problems? Thank you!

Minimal reproducible code

export AE=<path to AE checkpoint downloaded from https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/vae/diffusion_pytorch_model.safetensors>
import torch
from flux.util import load_ae

device = torch.device('cuda')
ae = load_ae("flux-schnell", device=device)
236 missing keys

encoder.down.0.block.0.norm1.weight encoder.down.0.block.0.norm1.bias encoder.down.0.block.0.conv1.weight encoder.down.0.block.0.conv1.bias encoder.down.0.block.0.norm2.weight encoder.down.0.block.0.norm2.bias encoder.down.0.block.0.conv2.weight encoder.down.0.block.0.conv2.bias encoder.down.0.block.1.norm1.weight encoder.down.0.block.1.norm1.bias encoder.down.0.block.1.conv1.weight encoder.down.0.block.1.conv1.bias encoder.down.0.block.1.norm2.weight encoder.down.0.block.1.norm2.bias encoder.down.0.block.1.conv2.weight encoder.down.0.block.1.conv2.bias encoder.down.0.downsample.conv.weight encoder.down.0.downsample.conv.bias encoder.down.1.block.0.norm1.weight encoder.down.1.block.0.norm1.bias encoder.down.1.block.0.conv1.weight encoder.down.1.block.0.conv1.bias encoder.down.1.block.0.norm2.weight encoder.down.1.block.0.norm2.bias encoder.down.1.block.0.conv2.weight encoder.down.1.block.0.conv2.bias encoder.down.1.block.0.nin_shortcut.weight encoder.down.1.block.0.nin_shortcut.bias encoder.down.1.block.1.norm1.weight encoder.down.1.block.1.norm1.bias encoder.down.1.block.1.conv1.weight encoder.down.1.block.1.conv1.bias encoder.down.1.block.1.norm2.weight encoder.down.1.block.1.norm2.bias encoder.down.1.block.1.conv2.weight encoder.down.1.block.1.conv2.bias encoder.down.1.downsample.conv.weight encoder.down.1.downsample.conv.bias encoder.down.2.block.0.norm1.weight encoder.down.2.block.0.norm1.bias encoder.down.2.block.0.conv1.weight encoder.down.2.block.0.conv1.bias encoder.down.2.block.0.norm2.weight encoder.down.2.block.0.norm2.bias encoder.down.2.block.0.conv2.weight encoder.down.2.block.0.conv2.bias encoder.down.2.block.0.nin_shortcut.weight encoder.down.2.block.0.nin_shortcut.bias encoder.down.2.block.1.norm1.weight encoder.down.2.block.1.norm1.bias encoder.down.2.block.1.conv1.weight encoder.down.2.block.1.conv1.bias encoder.down.2.block.1.norm2.weight encoder.down.2.block.1.norm2.bias encoder.down.2.block.1.conv2.weight encoder.down.2.block.1.conv2.bias encoder.down.2.downsample.conv.weight encoder.down.2.downsample.conv.bias encoder.down.3.block.0.norm1.weight encoder.down.3.block.0.norm1.bias encoder.down.3.block.0.conv1.weight encoder.down.3.block.0.conv1.bias encoder.down.3.block.0.norm2.weight encoder.down.3.block.0.norm2.bias encoder.down.3.block.0.conv2.weight encoder.down.3.block.0.conv2.bias encoder.down.3.block.1.norm1.weight encoder.down.3.block.1.norm1.bias encoder.down.3.block.1.conv1.weight encoder.down.3.block.1.conv1.bias encoder.down.3.block.1.norm2.weight encoder.down.3.block.1.norm2.bias encoder.down.3.block.1.conv2.weight encoder.down.3.block.1.conv2.bias encoder.mid.block_1.norm1.weight encoder.mid.block_1.norm1.bias encoder.mid.block_1.conv1.weight encoder.mid.block_1.conv1.bias encoder.mid.block_1.norm2.weight encoder.mid.block_1.norm2.bias encoder.mid.block_1.conv2.weight encoder.mid.block_1.conv2.bias encoder.mid.attn_1.norm.weight encoder.mid.attn_1.norm.bias encoder.mid.attn_1.q.weight encoder.mid.attn_1.q.bias encoder.mid.attn_1.k.weight encoder.mid.attn_1.k.bias encoder.mid.attn_1.v.weight encoder.mid.attn_1.v.bias encoder.mid.attn_1.proj_out.weight encoder.mid.attn_1.proj_out.bias encoder.mid.block_2.norm1.weight encoder.mid.block_2.norm1.bias encoder.mid.block_2.conv1.weight encoder.mid.block_2.conv1.bias encoder.mid.block_2.norm2.weight encoder.mid.block_2.norm2.bias encoder.mid.block_2.conv2.weight encoder.mid.block_2.conv2.bias encoder.norm_out.weight encoder.norm_out.bias decoder.mid.block_1.norm1.weight decoder.mid.block_1.norm1.bias decoder.mid.block_1.conv1.weight decoder.mid.block_1.conv1.bias decoder.mid.block_1.norm2.weight decoder.mid.block_1.norm2.bias decoder.mid.block_1.conv2.weight decoder.mid.block_1.conv2.bias decoder.mid.attn_1.norm.weight decoder.mid.attn_1.norm.bias decoder.mid.attn_1.q.weight decoder.mid.attn_1.q.bias decoder.mid.attn_1.k.weight decoder.mid.attn_1.k.bias decoder.mid.attn_1.v.weight decoder.mid.attn_1.v.bias decoder.mid.attn_1.proj_out.weight decoder.mid.attn_1.proj_out.bias decoder.mid.block_2.norm1.weight decoder.mid.block_2.norm1.bias decoder.mid.block_2.conv1.weight decoder.mid.block_2.conv1.bias decoder.mid.block_2.norm2.weight decoder.mid.block_2.norm2.bias decoder.mid.block_2.conv2.weight decoder.mid.block_2.conv2.bias decoder.up.0.block.0.norm1.weight decoder.up.0.block.0.norm1.bias decoder.up.0.block.0.conv1.weight decoder.up.0.block.0.conv1.bias decoder.up.0.block.0.norm2.weight decoder.up.0.block.0.norm2.bias decoder.up.0.block.0.conv2.weight decoder.up.0.block.0.conv2.bias decoder.up.0.block.0.nin_shortcut.weight decoder.up.0.block.0.nin_shortcut.bias decoder.up.0.block.1.norm1.weight decoder.up.0.block.1.norm1.bias decoder.up.0.block.1.conv1.weight decoder.up.0.block.1.conv1.bias decoder.up.0.block.1.norm2.weight decoder.up.0.block.1.norm2.bias decoder.up.0.block.1.conv2.weight decoder.up.0.block.1.conv2.bias decoder.up.0.block.2.norm1.weight decoder.up.0.block.2.norm1.bias decoder.up.0.block.2.conv1.weight decoder.up.0.block.2.conv1.bias decoder.up.0.block.2.norm2.weight decoder.up.0.block.2.norm2.bias decoder.up.0.block.2.conv2.weight decoder.up.0.block.2.conv2.bias decoder.up.1.block.0.norm1.weight decoder.up.1.block.0.norm1.bias decoder.up.1.block.0.conv1.weight decoder.up.1.block.0.conv1.bias decoder.up.1.block.0.norm2.weight decoder.up.1.block.0.norm2.bias decoder.up.1.block.0.conv2.weight decoder.up.1.block.0.conv2.bias decoder.up.1.block.0.nin_shortcut.weight decoder.up.1.block.0.nin_shortcut.bias decoder.up.1.block.1.norm1.weight decoder.up.1.block.1.norm1.bias decoder.up.1.block.1.conv1.weight decoder.up.1.block.1.conv1.bias decoder.up.1.block.1.norm2.weight decoder.up.1.block.1.norm2.bias decoder.up.1.block.1.conv2.weight decoder.up.1.block.1.conv2.bias decoder.up.1.block.2.norm1.weight decoder.up.1.block.2.norm1.bias decoder.up.1.block.2.conv1.weight decoder.up.1.block.2.conv1.bias decoder.up.1.block.2.norm2.weight decoder.up.1.block.2.norm2.bias decoder.up.1.block.2.conv2.weight decoder.up.1.block.2.conv2.bias decoder.up.1.upsample.conv.weight decoder.up.1.upsample.conv.bias decoder.up.2.block.0.norm1.weight decoder.up.2.block.0.norm1.bias decoder.up.2.block.0.conv1.weight decoder.up.2.block.0.conv1.bias decoder.up.2.block.0.norm2.weight decoder.up.2.block.0.norm2.bias decoder.up.2.block.0.conv2.weight decoder.up.2.block.0.conv2.bias decoder.up.2.block.1.norm1.weight decoder.up.2.block.1.norm1.bias decoder.up.2.block.1.conv1.weight decoder.up.2.block.1.conv1.bias decoder.up.2.block.1.norm2.weight decoder.up.2.block.1.norm2.bias decoder.up.2.block.1.conv2.weight decoder.up.2.block.1.conv2.bias decoder.up.2.block.2.norm1.weight decoder.up.2.block.2.norm1.bias decoder.up.2.block.2.conv1.weight decoder.up.2.block.2.conv1.bias decoder.up.2.block.2.norm2.weight decoder.up.2.block.2.norm2.bias decoder.up.2.block.2.conv2.weight decoder.up.2.block.2.conv2.bias decoder.up.2.upsample.conv.weight decoder.up.2.upsample.conv.bias decoder.up.3.block.0.norm1.weight decoder.up.3.block.0.norm1.bias decoder.up.3.block.0.conv1.weight decoder.up.3.block.0.conv1.bias decoder.up.3.block.0.norm2.weight decoder.up.3.block.0.norm2.bias decoder.up.3.block.0.conv2.weight decoder.up.3.block.0.conv2.bias decoder.up.3.block.1.norm1.weight decoder.up.3.block.1.norm1.bias decoder.up.3.block.1.conv1.weight decoder.up.3.block.1.conv1.bias decoder.up.3.block.1.norm2.weight decoder.up.3.block.1.norm2.bias decoder.up.3.block.1.conv2.weight decoder.up.3.block.1.conv2.bias decoder.up.3.block.2.norm1.weight decoder.up.3.block.2.norm1.bias decoder.up.3.block.2.conv1.weight decoder.up.3.block.2.conv1.bias decoder.up.3.block.2.norm2.weight decoder.up.3.block.2.norm2.bias decoder.up.3.block.2.conv2.weight decoder.up.3.block.2.conv2.bias decoder.up.3.upsample.conv.weight decoder.up.3.upsample.conv.bias decoder.norm_out.weight decoder.norm_out.bias

236 unexpected keys

encoder.conv_norm_out.bias encoder.conv_norm_out.weight encoder.down_blocks.0.downsamplers.0.conv.bias encoder.down_blocks.0.downsamplers.0.conv.weight encoder.down_blocks.0.resnets.0.conv1.bias encoder.down_blocks.0.resnets.0.conv1.weight encoder.down_blocks.0.resnets.0.conv2.bias encoder.down_blocks.0.resnets.0.conv2.weight encoder.down_blocks.0.resnets.0.norm1.bias encoder.down_blocks.0.resnets.0.norm1.weight encoder.down_blocks.0.resnets.0.norm2.bias encoder.down_blocks.0.resnets.0.norm2.weight encoder.down_blocks.0.resnets.1.conv1.bias encoder.down_blocks.0.resnets.1.conv1.weight encoder.down_blocks.0.resnets.1.conv2.bias encoder.down_blocks.0.resnets.1.conv2.weight encoder.down_blocks.0.resnets.1.norm1.bias encoder.down_blocks.0.resnets.1.norm1.weight encoder.down_blocks.0.resnets.1.norm2.bias encoder.down_blocks.0.resnets.1.norm2.weight encoder.down_blocks.1.downsamplers.0.conv.bias encoder.down_blocks.1.downsamplers.0.conv.weight encoder.down_blocks.1.resnets.0.conv1.bias encoder.down_blocks.1.resnets.0.conv1.weight encoder.down_blocks.1.resnets.0.conv2.bias encoder.down_blocks.1.resnets.0.conv2.weight encoder.down_blocks.1.resnets.0.conv_shortcut.bias encoder.down_blocks.1.resnets.0.conv_shortcut.weight encoder.down_blocks.1.resnets.0.norm1.bias encoder.down_blocks.1.resnets.0.norm1.weight encoder.down_blocks.1.resnets.0.norm2.bias encoder.down_blocks.1.resnets.0.norm2.weight encoder.down_blocks.1.resnets.1.conv1.bias encoder.down_blocks.1.resnets.1.conv1.weight encoder.down_blocks.1.resnets.1.conv2.bias encoder.down_blocks.1.resnets.1.conv2.weight encoder.down_blocks.1.resnets.1.norm1.bias encoder.down_blocks.1.resnets.1.norm1.weight encoder.down_blocks.1.resnets.1.norm2.bias encoder.down_blocks.1.resnets.1.norm2.weight encoder.down_blocks.2.downsamplers.0.conv.bias encoder.down_blocks.2.downsamplers.0.conv.weight encoder.down_blocks.2.resnets.0.conv1.bias encoder.down_blocks.2.resnets.0.conv1.weight encoder.down_blocks.2.resnets.0.conv2.bias encoder.down_blocks.2.resnets.0.conv2.weight encoder.down_blocks.2.resnets.0.conv_shortcut.bias encoder.down_blocks.2.resnets.0.conv_shortcut.weight encoder.down_blocks.2.resnets.0.norm1.bias encoder.down_blocks.2.resnets.0.norm1.weight encoder.down_blocks.2.resnets.0.norm2.bias encoder.down_blocks.2.resnets.0.norm2.weight encoder.down_blocks.2.resnets.1.conv1.bias encoder.down_blocks.2.resnets.1.conv1.weight encoder.down_blocks.2.resnets.1.conv2.bias encoder.down_blocks.2.resnets.1.conv2.weight encoder.down_blocks.2.resnets.1.norm1.bias encoder.down_blocks.2.resnets.1.norm1.weight encoder.down_blocks.2.resnets.1.norm2.bias encoder.down_blocks.2.resnets.1.norm2.weight encoder.down_blocks.3.resnets.0.conv1.bias encoder.down_blocks.3.resnets.0.conv1.weight encoder.down_blocks.3.resnets.0.conv2.bias encoder.down_blocks.3.resnets.0.conv2.weight encoder.down_blocks.3.resnets.0.norm1.bias encoder.down_blocks.3.resnets.0.norm1.weight encoder.down_blocks.3.resnets.0.norm2.bias encoder.down_blocks.3.resnets.0.norm2.weight encoder.down_blocks.3.resnets.1.conv1.bias encoder.down_blocks.3.resnets.1.conv1.weight encoder.down_blocks.3.resnets.1.conv2.bias encoder.down_blocks.3.resnets.1.conv2.weight encoder.down_blocks.3.resnets.1.norm1.bias encoder.down_blocks.3.resnets.1.norm1.weight encoder.down_blocks.3.resnets.1.norm2.bias encoder.down_blocks.3.resnets.1.norm2.weight encoder.mid_block.attentions.0.group_norm.bias encoder.mid_block.attentions.0.group_norm.weight encoder.mid_block.attentions.0.to_k.bias encoder.mid_block.attentions.0.to_k.weight encoder.mid_block.attentions.0.to_out.0.bias encoder.mid_block.attentions.0.to_out.0.weight encoder.mid_block.attentions.0.to_q.bias encoder.mid_block.attentions.0.to_q.weight encoder.mid_block.attentions.0.to_v.bias encoder.mid_block.attentions.0.to_v.weight encoder.mid_block.resnets.0.conv1.bias encoder.mid_block.resnets.0.conv1.weight encoder.mid_block.resnets.0.conv2.bias encoder.mid_block.resnets.0.conv2.weight encoder.mid_block.resnets.0.norm1.bias encoder.mid_block.resnets.0.norm1.weight encoder.mid_block.resnets.0.norm2.bias encoder.mid_block.resnets.0.norm2.weight encoder.mid_block.resnets.1.conv1.bias encoder.mid_block.resnets.1.conv1.weight encoder.mid_block.resnets.1.conv2.bias encoder.mid_block.resnets.1.conv2.weight encoder.mid_block.resnets.1.norm1.bias encoder.mid_block.resnets.1.norm1.weight encoder.mid_block.resnets.1.norm2.bias encoder.mid_block.resnets.1.norm2.weight decoder.conv_norm_out.bias decoder.conv_norm_out.weight decoder.mid_block.attentions.0.group_norm.bias decoder.mid_block.attentions.0.group_norm.weight decoder.mid_block.attentions.0.to_k.bias decoder.mid_block.attentions.0.to_k.weight decoder.mid_block.attentions.0.to_out.0.bias decoder.mid_block.attentions.0.to_out.0.weight decoder.mid_block.attentions.0.to_q.bias decoder.mid_block.attentions.0.to_q.weight decoder.mid_block.attentions.0.to_v.bias decoder.mid_block.attentions.0.to_v.weight decoder.mid_block.resnets.0.conv1.bias decoder.mid_block.resnets.0.conv1.weight decoder.mid_block.resnets.0.conv2.bias decoder.mid_block.resnets.0.conv2.weight decoder.mid_block.resnets.0.norm1.bias decoder.mid_block.resnets.0.norm1.weight decoder.mid_block.resnets.0.norm2.bias decoder.mid_block.resnets.0.norm2.weight decoder.mid_block.resnets.1.conv1.bias decoder.mid_block.resnets.1.conv1.weight decoder.mid_block.resnets.1.conv2.bias decoder.mid_block.resnets.1.conv2.weight decoder.mid_block.resnets.1.norm1.bias decoder.mid_block.resnets.1.norm1.weight decoder.mid_block.resnets.1.norm2.bias decoder.mid_block.resnets.1.norm2.weight decoder.up_blocks.0.resnets.0.conv1.bias decoder.up_blocks.0.resnets.0.conv1.weight decoder.up_blocks.0.resnets.0.conv2.bias decoder.up_blocks.0.resnets.0.conv2.weight decoder.up_blocks.0.resnets.0.norm1.bias decoder.up_blocks.0.resnets.0.norm1.weight decoder.up_blocks.0.resnets.0.norm2.bias decoder.up_blocks.0.resnets.0.norm2.weight decoder.up_blocks.0.resnets.1.conv1.bias decoder.up_blocks.0.resnets.1.conv1.weight decoder.up_blocks.0.resnets.1.conv2.bias decoder.up_blocks.0.resnets.1.conv2.weight decoder.up_blocks.0.resnets.1.norm1.bias decoder.up_blocks.0.resnets.1.norm1.weight decoder.up_blocks.0.resnets.1.norm2.bias decoder.up_blocks.0.resnets.1.norm2.weight decoder.up_blocks.0.resnets.2.conv1.bias decoder.up_blocks.0.resnets.2.conv1.weight decoder.up_blocks.0.resnets.2.conv2.bias decoder.up_blocks.0.resnets.2.conv2.weight decoder.up_blocks.0.resnets.2.norm1.bias decoder.up_blocks.0.resnets.2.norm1.weight decoder.up_blocks.0.resnets.2.norm2.bias decoder.up_blocks.0.resnets.2.norm2.weight decoder.up_blocks.0.upsamplers.0.conv.bias decoder.up_blocks.0.upsamplers.0.conv.weight decoder.up_blocks.1.resnets.0.conv1.bias decoder.up_blocks.1.resnets.0.conv1.weight decoder.up_blocks.1.resnets.0.conv2.bias decoder.up_blocks.1.resnets.0.conv2.weight decoder.up_blocks.1.resnets.0.norm1.bias decoder.up_blocks.1.resnets.0.norm1.weight decoder.up_blocks.1.resnets.0.norm2.bias decoder.up_blocks.1.resnets.0.norm2.weight decoder.up_blocks.1.resnets.1.conv1.bias decoder.up_blocks.1.resnets.1.conv1.weight decoder.up_blocks.1.resnets.1.conv2.bias decoder.up_blocks.1.resnets.1.conv2.weight decoder.up_blocks.1.resnets.1.norm1.bias decoder.up_blocks.1.resnets.1.norm1.weight decoder.up_blocks.1.resnets.1.norm2.bias decoder.up_blocks.1.resnets.1.norm2.weight decoder.up_blocks.1.resnets.2.conv1.bias decoder.up_blocks.1.resnets.2.conv1.weight decoder.up_blocks.1.resnets.2.conv2.bias decoder.up_blocks.1.resnets.2.conv2.weight decoder.up_blocks.1.resnets.2.norm1.bias decoder.up_blocks.1.resnets.2.norm1.weight decoder.up_blocks.1.resnets.2.norm2.bias decoder.up_blocks.1.resnets.2.norm2.weight decoder.up_blocks.1.upsamplers.0.conv.bias decoder.up_blocks.1.upsamplers.0.conv.weight decoder.up_blocks.2.resnets.0.conv1.bias decoder.up_blocks.2.resnets.0.conv1.weight decoder.up_blocks.2.resnets.0.conv2.bias decoder.up_blocks.2.resnets.0.conv2.weight decoder.up_blocks.2.resnets.0.conv_shortcut.bias decoder.up_blocks.2.resnets.0.conv_shortcut.weight decoder.up_blocks.2.resnets.0.norm1.bias decoder.up_blocks.2.resnets.0.norm1.weight decoder.up_blocks.2.resnets.0.norm2.bias decoder.up_blocks.2.resnets.0.norm2.weight decoder.up_blocks.2.resnets.1.conv1.bias decoder.up_blocks.2.resnets.1.conv1.weight decoder.up_blocks.2.resnets.1.conv2.bias decoder.up_blocks.2.resnets.1.conv2.weight decoder.up_blocks.2.resnets.1.norm1.bias decoder.up_blocks.2.resnets.1.norm1.weight decoder.up_blocks.2.resnets.1.norm2.bias decoder.up_blocks.2.resnets.1.norm2.weight decoder.up_blocks.2.resnets.2.conv1.bias decoder.up_blocks.2.resnets.2.conv1.weight decoder.up_blocks.2.resnets.2.conv2.bias decoder.up_blocks.2.resnets.2.conv2.weight decoder.up_blocks.2.resnets.2.norm1.bias decoder.up_blocks.2.resnets.2.norm1.weight decoder.up_blocks.2.resnets.2.norm2.bias decoder.up_blocks.2.resnets.2.norm2.weight decoder.up_blocks.2.upsamplers.0.conv.bias decoder.up_blocks.2.upsamplers.0.conv.weight decoder.up_blocks.3.resnets.0.conv1.bias decoder.up_blocks.3.resnets.0.conv1.weight decoder.up_blocks.3.resnets.0.conv2.bias decoder.up_blocks.3.resnets.0.conv2.weight decoder.up_blocks.3.resnets.0.conv_shortcut.bias decoder.up_blocks.3.resnets.0.conv_shortcut.weight decoder.up_blocks.3.resnets.0.norm1.bias decoder.up_blocks.3.resnets.0.norm1.weight decoder.up_blocks.3.resnets.0.norm2.bias decoder.up_blocks.3.resnets.0.norm2.weight decoder.up_blocks.3.resnets.1.conv1.bias decoder.up_blocks.3.resnets.1.conv1.weight decoder.up_blocks.3.resnets.1.conv2.bias decoder.up_blocks.3.resnets.1.conv2.weight decoder.up_blocks.3.resnets.1.norm1.bias decoder.up_blocks.3.resnets.1.norm1.weight decoder.up_blocks.3.resnets.1.norm2.bias decoder.up_blocks.3.resnets.1.norm2.weight decoder.up_blocks.3.resnets.2.conv1.bias decoder.up_blocks.3.resnets.2.conv1.weight decoder.up_blocks.3.resnets.2.conv2.bias decoder.up_blocks.3.resnets.2.conv2.weight decoder.up_blocks.3.resnets.2.norm1.bias decoder.up_blocks.3.resnets.2.norm1.weight decoder.up_blocks.3.resnets.2.norm2.bias decoder.up_blocks.3.resnets.2.norm2.weight