Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

Open likeatingcake opened 3 months ago

likeatingcake commented 3 months ago

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

maxin-cn commented 3 months ago
  • decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
  • decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
  • encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

likeatingcake commented 3 months ago
  • decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
  • decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
  • encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/t2v.sh Using model! Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 167, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 38, in main vae = AutoencoderKL.from_pretrained(args.pretrained_model_path, subfolder="vae", torch_dtype=torch.float16).to( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 812, in from_pretrained unexpected_keys = load_model_dict_into_meta( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta raise ValueError( ValueError: Cannot load /home/yueyc/Latte/t2v_required_models/ because decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example. 之前的代码在加载vae预训练模型时,我添加了 low_cpu_mem_usage=False and `ignore_mismatched_sizes=True这两个参数,但会出现之前提到的警告,如果不添加这两个参数,便会出现上面的错误。