AILab-CVC / VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
https://ailab-cvc.github.io/videocrafter2/
Other
4.58k stars 342 forks source link

After following all steps still get huge missmatch in loading the checkpoint #55

Open Wonder1905 opened 10 months ago

Wonder1905 commented 10 months ago

Im running with the following command: python inference.py --seed 123 --mode 'i2v' --ckpt_path checkpoints/i2v_512_v1/model.ckpt --config configs/inference_i2v_512_v1.0.yaml --savedir results/i2v_512_test --n_samples 1 --bs 1 --height 320 --width 512 --unconditional_guidance_scale 12.0 --ddim_steps 50 --ddim_eta 1.0 --prompt_file prompts/i2v_prompts/test_prompts.txt --cond_input prompts/i2v_prompts --fps 8

RuntimeError: Error(s) in loading state_dict for LatentVisualDiffusion: Missing key(s) in state_dict: "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.bias", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.bias", ..... "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.bias", ... "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k_ip.weight", ... "embedder.model.visual.transformer.resblocks.1.ln_2.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_1.weight", "embedder.model.visual.transformer.resblocks.2.ln_1.bias", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_2.weight", "embedder.model.visual.transformer.resblocks.2.ln_2.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_1.weight", "embedder.model.visual.transformer.resblocks.3.ln_1.bias", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_2.weight", "embedder.model.visual.transformer.resblocks.3.ln_2.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_1.weight", "embedder.model.visual.transformer.resblocks.4.ln_1.bias", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_2.weight", "embedder.model.visual.transformer.resblocks.4.ln_2.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_1.weight", "embedder.model.visual.transformer.resblocks.5.ln_1.bias", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_2.weight", "embedder.model.visual.transformer.resblocks.5.ln_2.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_proj.weight", .... "embedder.model.visual.transformer.resblocks.29.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.29.ln_2.weight", "embedder.model.visual.transformer.resblocks.29.ln_2.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_1.weight", "embedder.model.visual.transformer.resblocks.30.ln_1.bias", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_2.weight", "embedder.model.visual.transformer.resblocks.30.ln_2.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.31.ln_1.weight", "embedder.model.visual.transformer.resblocks.31.ln_1.bias", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_bias", "image_proj_model.layers.1.1.1.weight", "image_proj_model.layers.1.1.3.weight", "image_proj_model.layers.2.0.norm1.weight", "image_proj_model.layers.2.0.norm1.bias", "image_proj_model.layers.2.0.norm2.weight", "image_proj_model.layers.2.0.norm2.bias", "image_proj_model.layers.2.0.to_q.weight", "image_proj_model.layers.2.0.to_kv.weight", "image_proj_model.layers.2.0.to_out.weight", "image_proj_model.layers.2.1.0.weight", "image_proj_model.layers.2.1.0.bias", "image_proj_model.layers.2.1.1.weight", "image_proj_model.layers.2.1.3.weight", "image_proj_model.layers.3.0.norm1.weight", "image_proj_model.layers.3.0.norm1.bias", "image_proj_model.layers.3.0.norm2.weight", "image_proj_model.layers.3.0.norm2.bias", "image_proj_model.layers.3.0.to_q.weight", "image_proj_model.layers.3.0.to_kv.weight", "image_proj_model.layers.3.0.to_out.weight", "image_proj_model.layers.3.1.0.weight", "image_proj_model.layers.3.1.0.bias", "image_proj_model.layers.3.1.1.weight", "image_proj_model.layers.3.1.3.weight". Unexpected key(s) in state_dict: "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", ... "model.diffusion_model.init_attn.0.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", ... "model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table". size mismatch for scale_arr: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([1400]).

Any idea what Im missing?

bilalimran1412 commented 5 months ago

Having same issue

bilalimran1412 commented 5 months ago

Happened with me too but I imported/downlaoded wrong model. Do check the model and retry