facebookresearch / ConvNeXt

Code release for ConvNeXt model
MIT License
5.79k stars 696 forks source link

Meet unexcepted key when creating convnext_isotropic_large pretrained model #51

Closed rentainhe closed 2 years ago

rentainhe commented 2 years ago

Hi~, I've try to load the pretrained weight of convnext_isotropic_large model myself, but I met some unexcepted keys, the code is like

model = convnext_isotropic_large(pretrained=True)

And meet RunTimeError:

 Unexpected key(s) in state_dict: "blocks.0.gamma", "blocks.1.gamma", "blocks.2.gamma", "blocks.3.gamma", "blocks.4.gamma", "blocks.5.gamma", "blocks.6.gamma", "blocks.7.gamma", "blocks.8.gamma", "blocks.9.gamma", "blocks.10.gamma", "blocks.11.gamma", "blocks.12.gamma", "blocks.13.gamma", "blocks.14.gamma", "blocks.15.gamma", "blocks.16.gamma", "blocks.17.gamma", "blocks.18.gamma", "blocks.19.gamma", "blocks.20.gamma", "blocks.21.gamma", "blocks.22.gamma", "blocks.23.gamma", "blocks.24.gamma", "blocks.25.gamma", "blocks.26.gamma", "blocks.27.gamma", "blocks.28.gamma", "blocks.29.gamma", "blocks.30.gamma", "blocks.31.gamma", "blocks.32.gamma", "blocks.33.gamma", "blocks.34.gamma", "blocks.35.gamma". 

I think there might be something wrong about the pretrained weight here.

liuzhuang13 commented 2 years ago

Hi,

We use layer scale in convnext_isotropic_large so you'll have to specify a non-zero layer_scale_init_value, e.g.,

model = convnext_isotropic_large(pretrained=True, layer_scale_init_value=1e-6)

In our main.py the default value of layer_scale_init_value is 1e-6 so it wouldn't be a problem if you load using main.py

rentainhe commented 2 years ago

Hi,

We use layer scale in convnext_isotropic_large so you'll have to specify a non-zero layer_scale_init_value, e.g.,

model = convnext_isotropic_large(pretrained=True, layer_scale_init_value=1e-6)

In our main.py the default value of layer_scale_init_value is 1e-6 so it wouldn't be a problem if you load using main.py

Thanks a lot! Do any other iso convnext models have the same layer_scale_init_value?

liuzhuang13 commented 2 years ago

No, the other ones do not use layer scale so you can set or leave layer_scale_init_value at default (0) in model creation. We mentioned this in the 3rd paragraph of appendix A.1, or you can find the differences between training isotropic S/B and L in our TRAINING.md.

rentainhe commented 2 years ago

No, the other ones do not use layer scale so you can set or leave layer_scale_init_value at default (0) in model creation. We mentioned this in the 3rd paragraph of appendix A.1, or you can find the differences between training isotropic S/B and L in our TRAINING.md.

Thank you~, that helps me a lot~