HL-hanlin / Ctrl-Adapter

Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
https://ctrl-adapter.github.io/
Apache License 2.0
342 stars 14 forks source link

Mismatch for the depth map checkpoint from MiDaS. #18

Open yuzhou914 opened 1 month ago

yuzhou914 commented 1 month ago

Hello, thanks for your great work. When I use the training code with depth condition, I download the depth checkpoint from dpt_swin2_large_384, the official checkpoint given by MiDaS. However, I meet the mismatch problem when I load the checkpoint here. It is strange, do you know what is the problem?

RuntimeError: Error(s) in loading state_dict for DPTDepthModel: Missing key(s) in state_dict: "pretrained.model.layers.3.downsample.reduction.weight", "pretrained.model.layers.3.downsample.norm.weight", "pretrained.model.layers.3.downsample.norm.bias", "pretrained.model.head.fc.weight", "pretrained.model.head.fc.bias". Unexpected key(s) in state_dict: "pretrained.model.layers.0.downsample.reduction.weight", "pretrained.model.layers.0.downsample.norm.weight", "pretrained.model.layers.0.downsample.norm.bias", "pretrained.model.layers.0.blocks.1.attn_mask", "pretrained.model.layers.1.blocks.1.attn_mask", "pretrained.model.head.weight", "pretrained.model.head.bias". size mismatch for pretrained.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for pretrained.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for pretrained.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for pretrained.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1536, 3072]) from checkpoint, the shape in current model is torch.Size([768, 1536]). size mismatch for pretrained.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).

HL-hanlin commented 1 month ago

Hi Yuzhou,

Thanks for trying our model!

I remembered similar issue happened to me because timm library version is too high. In requirement_train.txt, we install timm==0.6.12. Could you try to downgrade your timm library version to it?

You can also see a reference link about this issue here

Let me know if this helps or not!

yuzhou914 commented 1 month ago

Hello, even though I reinstall the timm==0.6.12, I will still meet the problem when I use: helper.add_depth_estimator(estimator_ckpt_path = os.path.join(DATA_PATH, "ckpts/DepthMidas/dpt_swin2_large_384.pt")). Do you have any other mind? Thanks.

HL-hanlin commented 1 month ago

I don't have other better ideas for now lol.

Maybe could you double check if the timm version is correct via command "pip list"? If the problem still exist, you can let me know your email/wechat, and I'll set up a zoom meeting so that we can debug together. Thanks!