Model file did not match with the given model

DISHENGRZH commented 2 months ago

Thank you for your amazing work,but I get the following mistakes. I applied the DataParallel instead of the DDP version. When I used the 3dpw_best_ckpt.pth.tar from the https://cloud.tsinghua.edu.cn/d/1d6cd3ee30204bb59fce/files/?p=%2F3dpw_best_ckpt.pth.tar, I get the error:RuntimeError: Error(s) in loading state_dict for DataParallel: Missing key(s) in state_dict: "module.backbone.sd_model.model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.weight", while the keys in the ckpt are "module.backbone.sd_model.model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.linear.weight". So I delete the 'linear', 'lora_down', 'lora_up' in the ckpt. Then I get the error size mismatch for module.backbone.sd_model.model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([320, 64]) from checkpoint, the shape in current model is torch.Size([320, 320]). How can I solve this problem?

RammusLeo commented 2 months ago

Hi! Thanks for your interests. I think ".weight" and ".linear.weight" are different parameter layers? The injection of lora happens in this line. I have no idea about whether it will make differences between DP and DDP, or there may be other issues in your code, especially the parameters loading function. There's a common issue when you turn off the DDP you may remove the prefix "module." during loading. It will be better if you share more about your change of the code so that we can make further discussion.

DISHENGRZH commented 2 months ago

Thank you for your valuable reply. It works for me to convert the DP and DDP module to the working module with single card. I remove the prefix "module." during loading. Then I revised the target_replace_module in Lora, it worked.

RammusLeo / DPMesh

Model file did not match with the given model #7