282857341 / nnFormer

MIT License
798 stars 84 forks source link

About the Pre-trained model weights #43

Open Jx-Tan opened 2 years ago

Jx-Tan commented 2 years ago

Thanks for your working.

I noticed that you mentioned pre trained model weights in Section 4.1 of the article, I have some interest in this. How should I do this part of pre training? What should I do if I pre train medical images? And If the network model is changed, whether the pre training is no longer applicable to the changed model and whether a new pre training is required?

Thanks a lot.

282857341 commented 2 years ago

The pre-training weight is the training result on imagenet provided by swin transformer

auroua commented 2 years ago

@282857341 Can you specific which swin architecture (tiny, small, base, large) is used?

auroua commented 2 years ago

Since the default embedding size is 192, the checkpoint file of the large swin model swin_large_patch4_window7_224_22k.pth is used to initialize the nnFormor's weights. The following errors are occurred during the initialization.

RuntimeError: Error(s) in loading state_dict for nnFormer:
    Missing key(s) in state_dict: "model_down.patch_embed.proj1.conv1.weight", "model_down.patch_embed.proj1.conv1.bias", "model_down.patch_embed.proj1.conv2.weight", "model_down.patch_embed.proj1.conv2.bias", "model_down.patch_embed.proj1.norm1.weight", "model_down.patch_embed.proj1.norm1.bias", "model_down.patch_embed.proj1.norm2.weight", "model_down.patch_embed.proj1.norm2.bias", "model_down.patch_embed.proj2.conv1.weight", "model_down.patch_embed.proj2.conv1.bias", "model_down.patch_embed.proj2.conv2.weight", "model_down.patch_embed.proj2.conv2.bias", "model_down.patch_embed.proj2.norm1.weight", "model_down.patch_embed.proj2.norm1.bias", "model_down.patch_embed.norm.weight", "model_down.patch_embed.norm.bias", "model_down.layers.0.blocks.0.norm1.weight", "model_down.layers.0.blocks.0.norm1.bias", "model_down.layers.0.blocks.0.attn.relative_position_bias_table", "model_down.layers.0.blocks.0.attn.relative_position_index", "model_down.layers.0.blocks.0.attn.qkv.weight", "model_down.layers.0.blocks.0.attn.qkv.bias", "model_down.layers.0.blocks.0.attn.proj.weight", "model_down.layers.0.blocks.0.attn.proj.bias", "model_down.layers.0.blocks.0.norm2.weight", "model_down.layers.0.blocks.0.norm2.bias", "model_down.layers.0.blocks.0.mlp.fc1.weight", "model_down.layers.0.blocks.0.mlp.fc1.bias", "model_down.layers.0.blocks.0.mlp.fc2.weight", "model_down.layers.0.blocks.0.mlp.fc2.bias", "model_down.layers.0.blocks.1.norm1.weight", "model_down.layers.0.blocks.1.norm1.bias", "model_down.layers.0.blocks.1.attn.relative_position_bias_table", "model_down.layers.0.blocks.1.attn.relative_position_index", "model_down.layers.0.blocks.1.attn.qkv.weight", "model_down.layers.0.blocks.1.attn.qkv.bias", "model_down.layers.0.blocks.1.attn.proj.weight", "model_down.layers.0.blocks.1.attn.proj.bias", "model_down.layers.0.blocks.1.norm2.weight", "model_down.layers.0.blocks.1.norm2.bias", "model_down.layers.0.blocks.1.mlp.fc1.weight", "model_down.layers.0.blocks.1.mlp.fc1.bias", "model_down.layers.0.blocks.1.mlp.fc2.weight", "model_down.layers.0.blocks.1.mlp.fc2.bias", "model_down.layers.0.downsample.reduction.weight", "model_down.layers.0.downsample.reduction.bias", "model_down.layers.0.downsample.norm.weight", "model_down.layers.0.downsample.norm.bias", "model_down.layers.1.blocks.0.norm1.weight", "model_down.layers.1.blocks.0.norm1.bias", "model_down.layers.1.blocks.0.attn.relative_position_bias_table", "model_down.layers.1.blocks.0.attn.relative_position_index", "model_down.layers.1.blocks.0.attn.qkv.weight", "model_down.layers.1.blocks.0.attn.qkv.bias", "model_down.layers.1.blocks.0.attn.proj.weight", "model_down.layers.1.blocks.0.attn.proj.bias", "model_down.layers.1.blocks.0.norm2.weight", "model_down.layers.1.blocks.0.norm2.bias", "model_down.layers.1.blocks.0.mlp.fc1.weight", "model_down.layers.1.blocks.0.mlp.fc1.bias", "model_down.layers.1.blocks.0.mlp.fc2.weight", "model_down.layers.1.blocks.0.mlp.fc2.bias", "model_down.layers.1.blocks.1.norm1.weight", "model_down.layers.1.blocks.1.norm1.bias", "model_down.layers.1.blocks.1.attn.relative_position_bias_table", "model_down.layers.1.blocks.1.attn.relative_position_index", "model_down.layers.1.blocks.1.attn.qkv.weight", "model_down.layers.1.blocks.1.attn.qkv.bias", "model_down.layers.1.blocks.1.attn.proj.weight", "model_down.layers.1.blocks.1.attn.proj.bias", "model_down.layers.1.blocks.1.norm2.weight", "model_down.layers.1.blocks.1.norm2.bias", "model_down.layers.1.blocks.1.mlp.fc1.weight", "model_down.layers.1.blocks.1.mlp.fc1.bias", "model_down.layers.1.blocks.1.mlp.fc2.weight", "model_down.layers.1.blocks.1.mlp.fc2.bias", "model_down.layers.1.downsample.reduction.weight", "model_down.layers.1.downsample.reduction.bias", "model_down.layers.1.downsample.norm.weight", "model_down.layers.1.downsample.norm.bias", "model_down.layers.2.blocks.0.norm1.weight", "model_down.layers.2.blocks.0.norm1.bias", "model_down.layers.2.blocks.0.attn.relative_position_bias_table", "model_down.layers.2.blocks.0.attn.relative_position_index", "model_down.layers.2.blocks.0.attn.qkv.weight", "model_down.layers.2.blocks.0.attn.qkv.bias", "model_down.layers.2.blocks.0.attn.proj.weight", "model_down.layers.2.blocks.0.attn.proj.bias", "model_down.layers.2.blocks.0.norm2.weight", "model_down.layers.2.blocks.0.norm2.bias", "model_down.layers.2.blocks.0.mlp.fc1.weight", "model_down.layers.2.blocks.0.mlp.fc1.bias", "model_down.layers.2.blocks.0.mlp.fc2.weight", "model_down.layers.2.blocks.0.mlp.fc2.bias", "model_down.layers.2.blocks.1.norm1.weight", "model_down.layers.2.blocks.1.norm1.bias", "model_down.layers.2.blocks.1.attn.relative_position_bias_table", "model_down.layers.2.blocks.1.attn.relative_position_index", "model_down.layers.2.blocks.1.attn.qkv.weight", "model_down.layers.2.blocks.1.attn.qkv.bias", "model_down.layers.2.blocks.1.attn.proj.weight", "model_down.layers.2.blocks.1.attn.proj.bias", "model_down.layers.2.blocks.1.norm2.weight", "model_down.layers.2.blocks.1.norm2.bias", "model_down.layers.2.blocks.1.mlp.fc1.weight", "model_down.layers.2.blocks.1.mlp.fc1.bias", "model_down.layers.2.blocks.1.mlp.fc2.weight", "model_down.layers.2.blocks.1.mlp.fc2.bias", "model_down.layers.2.downsample.reduction.weight", "model_down.layers.2.downsample.reduction.bias", "model_down.layers.2.downsample.norm.weight", "model_down.layers.2.downsample.norm.bias", "model_down.layers.3.blocks.0.norm1.weight", "model_down.layers.3.blocks.0.norm1.bias", "model_down.layers.3.blocks.0.attn.relative_position_bias_table", "model_down.layers.3.blocks.0.attn.relative_position_index", "model_down.layers.3.blocks.0.attn.qkv.weight", "model_down.layers.3.blocks.0.attn.qkv.bias", "model_down.layers.3.blocks.0.attn.proj.weight", "model_down.layers.3.blocks.0.attn.proj.bias", "model_down.layers.3.blocks.0.norm2.weight", "model_down.layers.3.blocks.0.norm2.bias", "model_down.layers.3.blocks.0.mlp.fc1.weight", "model_down.layers.3.blocks.0.mlp.fc1.bias", "model_down.layers.3.blocks.0.mlp.fc2.weight", "model_down.layers.3.blocks.0.mlp.fc2.bias", "model_down.layers.3.blocks.1.norm1.weight", "model_down.layers.3.blocks.1.norm1.bias", "model_down.layers.3.blocks.1.attn.relative_position_bias_table", "model_down.layers.3.blocks.1.attn.relative_position_index", "model_down.layers.3.blocks.1.attn.qkv.weight", "model_down.layers.3.blocks.1.attn.qkv.bias", "model_down.layers.3.blocks.1.attn.proj.weight", "model_down.layers.3.blocks.1.attn.proj.bias", "model_down.layers.3.blocks.1.norm2.weight", "model_down.layers.3.blocks.1.norm2.bias", "model_down.layers.3.blocks.1.mlp.fc1.weight", "model_down.layers.3.blocks.1.mlp.fc1.bias", "model_down.layers.3.blocks.1.mlp.fc2.weight", "model_down.layers.3.blocks.1.mlp.fc2.bias", "model_down.norm0.weight", "model_down.norm0.bias", "model_down.norm1.weight", "model_down.norm1.bias", "model_down.norm2.weight", "model_down.norm2.bias", "model_down.norm3.weight", "model_down.norm3.bias", "decoder.layers.0.blocks.0.norm1.weight", "decoder.layers.0.blocks.0.norm1.bias", "decoder.layers.0.blocks.0.attn.relative_position_bias_table", "decoder.layers.0.blocks.0.attn.relative_position_index", "decoder.layers.0.blocks.0.attn.kv.weight", "decoder.layers.0.blocks.0.attn.kv.bias", "decoder.layers.0.blocks.0.attn.proj.weight", "decoder.layers.0.blocks.0.attn.proj.bias", "decoder.layers.0.blocks.0.norm2.weight", "decoder.layers.0.blocks.0.norm2.bias", "decoder.layers.0.blocks.0.mlp.fc1.weight", "decoder.layers.0.blocks.0.mlp.fc1.bias", "decoder.layers.0.blocks.0.mlp.fc2.weight", "decoder.layers.0.blocks.0.mlp.fc2.bias", "decoder.layers.0.blocks.1.norm1.weight", "decoder.layers.0.blocks.1.norm1.bias", "decoder.layers.0.blocks.1.attn.relative_position_bias_table", "decoder.layers.0.blocks.1.attn.relative_position_index", "decoder.layers.0.blocks.1.attn.qkv.weight", "decoder.layers.0.blocks.1.attn.qkv.bias", "decoder.layers.0.blocks.1.attn.proj.weight", "decoder.layers.0.blocks.1.attn.proj.bias", "decoder.layers.0.blocks.1.norm2.weight", "decoder.layers.0.blocks.1.norm2.bias", "decoder.layers.0.blocks.1.mlp.fc1.weight", "decoder.layers.0.blocks.1.mlp.fc1.bias", "decoder.layers.0.blocks.1.mlp.fc2.weight", "decoder.layers.0.blocks.1.mlp.fc2.bias", "decoder.layers.0.Upsample.norm.weight", "decoder.layers.0.Upsample.norm.bias", "decoder.layers.0.Upsample.up.weight", "decoder.layers.0.Upsample.up.bias", "decoder.layers.1.blocks.0.norm1.weight", "decoder.layers.1.blocks.0.norm1.bias", "decoder.layers.1.blocks.0.attn.relative_position_bias_table", "decoder.layers.1.blocks.0.attn.relative_position_index", "decoder.layers.1.blocks.0.attn.kv.weight", "decoder.layers.1.blocks.0.attn.kv.bias", "decoder.layers.1.blocks.0.attn.proj.weight", "decoder.layers.1.blocks.0.attn.proj.bias", "decoder.layers.1.blocks.0.norm2.weight", "decoder.layers.1.blocks.0.norm2.bias", "decoder.layers.1.blocks.0.mlp.fc1.weight", "decoder.layers.1.blocks.0.mlp.fc1.bias", "decoder.layers.1.blocks.0.mlp.fc2.weight", "decoder.layers.1.blocks.0.mlp.fc2.bias", "decoder.layers.1.blocks.1.norm1.weight", "decoder.layers.1.blocks.1.norm1.bias", "decoder.layers.1.blocks.1.attn.relative_position_bias_table", "decoder.layers.1.blocks.1.attn.relative_position_index", "decoder.layers.1.blocks.1.attn.qkv.weight", "decoder.layers.1.blocks.1.attn.qkv.bias", "decoder.layers.1.blocks.1.attn.proj.weight", "decoder.layers.1.blocks.1.attn.proj.bias", "decoder.layers.1.blocks.1.norm2.weight", "decoder.layers.1.blocks.1.norm2.bias", "decoder.layers.1.blocks.1.mlp.fc1.weight", "decoder.layers.1.blocks.1.mlp.fc1.bias", "decoder.layers.1.blocks.1.mlp.fc2.weight", "decoder.layers.1.blocks.1.mlp.fc2.bias", "decoder.layers.1.Upsample.norm.weight", "decoder.layers.1.Upsample.norm.bias", "decoder.layers.1.Upsample.up.weight", "decoder.layers.1.Upsample.up.bias", "decoder.layers.2.blocks.0.norm1.weight", "decoder.layers.2.blocks.0.norm1.bias", "decoder.layers.2.blocks.0.attn.relative_position_bias_table", "decoder.layers.2.blocks.0.attn.relative_position_index", "decoder.layers.2.blocks.0.attn.kv.weight", "decoder.layers.2.blocks.0.attn.kv.bias", "decoder.layers.2.blocks.0.attn.proj.weight", "decoder.layers.2.blocks.0.attn.proj.bias", "decoder.layers.2.blocks.0.norm2.weight", "decoder.layers.2.blocks.0.norm2.bias", "decoder.layers.2.blocks.0.mlp.fc1.weight", "decoder.layers.2.blocks.0.mlp.fc1.bias", "decoder.layers.2.blocks.0.mlp.fc2.weight", "decoder.layers.2.blocks.0.mlp.fc2.bias", "decoder.layers.2.blocks.1.norm1.weight", "decoder.layers.2.blocks.1.norm1.bias", "decoder.layers.2.blocks.1.attn.relative_position_bias_table", "decoder.layers.2.blocks.1.attn.relative_position_index", "decoder.layers.2.blocks.1.attn.qkv.weight", "decoder.layers.2.blocks.1.attn.qkv.bias", "decoder.layers.2.blocks.1.attn.proj.weight", "decoder.layers.2.blocks.1.attn.proj.bias", "decoder.layers.2.blocks.1.norm2.weight", "decoder.layers.2.blocks.1.norm2.bias", "decoder.layers.2.blocks.1.mlp.fc1.weight", "decoder.layers.2.blocks.1.mlp.fc1.bias", "decoder.layers.2.blocks.1.mlp.fc2.weight", "decoder.layers.2.blocks.1.mlp.fc2.bias", "decoder.layers.2.Upsample.norm.weight", "decoder.layers.2.Upsample.norm.bias", "decoder.layers.2.Upsample.up.weight", "decoder.layers.2.Upsample.up.bias", "final.0.up.weight", "final.0.up.bias", "final.1.up.weight", "final.1.up.bias", "final.2.up.weight", "final.2.up.bias". 
    Unexpected key(s) in state_dict: "model". 

Could you provide the pre-trained model file gelunorm_former_skip_global_shift.model?

CHANGHAI-AILab commented 2 years ago

Since the default embedding size is 192, the checkpoint file of the large swin model swin_large_patch4_window7_224_22k.pth is used to initialize the nnFormor's weights. The following errors are occurred during the initialization.

RuntimeError: Error(s) in loading state_dict for nnFormer:
  Missing key(s) in state_dict: "model_down.patch_embed.proj1.conv1.weight", "model_down.patch_embed.proj1.conv1.bias", "model_down.patch_embed.proj1.conv2.weight", "model_down.patch_embed.proj1.conv2.bias", "model_down.patch_embed.proj1.norm1.weight", "model_down.patch_embed.proj1.norm1.bias", "model_down.patch_embed.proj1.norm2.weight", "model_down.patch_embed.proj1.norm2.bias", "model_down.patch_embed.proj2.conv1.weight", "model_down.patch_embed.proj2.conv1.bias", "model_down.patch_embed.proj2.conv2.weight", "model_down.patch_embed.proj2.conv2.bias", "model_down.patch_embed.proj2.norm1.weight", "model_down.patch_embed.proj2.norm1.bias", "model_down.patch_embed.norm.weight", "model_down.patch_embed.norm.bias", "model_down.layers.0.blocks.0.norm1.weight", "model_down.layers.0.blocks.0.norm1.bias", "model_down.layers.0.blocks.0.attn.relative_position_bias_table", "model_down.layers.0.blocks.0.attn.relative_position_index", "model_down.layers.0.blocks.0.attn.qkv.weight", "model_down.layers.0.blocks.0.attn.qkv.bias", "model_down.layers.0.blocks.0.attn.proj.weight", "model_down.layers.0.blocks.0.attn.proj.bias", "model_down.layers.0.blocks.0.norm2.weight", "model_down.layers.0.blocks.0.norm2.bias", "model_down.layers.0.blocks.0.mlp.fc1.weight", "model_down.layers.0.blocks.0.mlp.fc1.bias", "model_down.layers.0.blocks.0.mlp.fc2.weight", "model_down.layers.0.blocks.0.mlp.fc2.bias", "model_down.layers.0.blocks.1.norm1.weight", "model_down.layers.0.blocks.1.norm1.bias", "model_down.layers.0.blocks.1.attn.relative_position_bias_table", "model_down.layers.0.blocks.1.attn.relative_position_index", "model_down.layers.0.blocks.1.attn.qkv.weight", "model_down.layers.0.blocks.1.attn.qkv.bias", "model_down.layers.0.blocks.1.attn.proj.weight", "model_down.layers.0.blocks.1.attn.proj.bias", "model_down.layers.0.blocks.1.norm2.weight", "model_down.layers.0.blocks.1.norm2.bias", "model_down.layers.0.blocks.1.mlp.fc1.weight", "model_down.layers.0.blocks.1.mlp.fc1.bias", "model_down.layers.0.blocks.1.mlp.fc2.weight", "model_down.layers.0.blocks.1.mlp.fc2.bias", "model_down.layers.0.downsample.reduction.weight", "model_down.layers.0.downsample.reduction.bias", "model_down.layers.0.downsample.norm.weight", "model_down.layers.0.downsample.norm.bias", "model_down.layers.1.blocks.0.norm1.weight", "model_down.layers.1.blocks.0.norm1.bias", "model_down.layers.1.blocks.0.attn.relative_position_bias_table", "model_down.layers.1.blocks.0.attn.relative_position_index", "model_down.layers.1.blocks.0.attn.qkv.weight", "model_down.layers.1.blocks.0.attn.qkv.bias", "model_down.layers.1.blocks.0.attn.proj.weight", "model_down.layers.1.blocks.0.attn.proj.bias", "model_down.layers.1.blocks.0.norm2.weight", "model_down.layers.1.blocks.0.norm2.bias", "model_down.layers.1.blocks.0.mlp.fc1.weight", "model_down.layers.1.blocks.0.mlp.fc1.bias", "model_down.layers.1.blocks.0.mlp.fc2.weight", "model_down.layers.1.blocks.0.mlp.fc2.bias", "model_down.layers.1.blocks.1.norm1.weight", "model_down.layers.1.blocks.1.norm1.bias", "model_down.layers.1.blocks.1.attn.relative_position_bias_table", "model_down.layers.1.blocks.1.attn.relative_position_index", "model_down.layers.1.blocks.1.attn.qkv.weight", "model_down.layers.1.blocks.1.attn.qkv.bias", "model_down.layers.1.blocks.1.attn.proj.weight", "model_down.layers.1.blocks.1.attn.proj.bias", "model_down.layers.1.blocks.1.norm2.weight", "model_down.layers.1.blocks.1.norm2.bias", "model_down.layers.1.blocks.1.mlp.fc1.weight", "model_down.layers.1.blocks.1.mlp.fc1.bias", "model_down.layers.1.blocks.1.mlp.fc2.weight", "model_down.layers.1.blocks.1.mlp.fc2.bias", "model_down.layers.1.downsample.reduction.weight", "model_down.layers.1.downsample.reduction.bias", "model_down.layers.1.downsample.norm.weight", "model_down.layers.1.downsample.norm.bias", "model_down.layers.2.blocks.0.norm1.weight", "model_down.layers.2.blocks.0.norm1.bias", "model_down.layers.2.blocks.0.attn.relative_position_bias_table", "model_down.layers.2.blocks.0.attn.relative_position_index", "model_down.layers.2.blocks.0.attn.qkv.weight", "model_down.layers.2.blocks.0.attn.qkv.bias", "model_down.layers.2.blocks.0.attn.proj.weight", "model_down.layers.2.blocks.0.attn.proj.bias", "model_down.layers.2.blocks.0.norm2.weight", "model_down.layers.2.blocks.0.norm2.bias", "model_down.layers.2.blocks.0.mlp.fc1.weight", "model_down.layers.2.blocks.0.mlp.fc1.bias", "model_down.layers.2.blocks.0.mlp.fc2.weight", "model_down.layers.2.blocks.0.mlp.fc2.bias", "model_down.layers.2.blocks.1.norm1.weight", "model_down.layers.2.blocks.1.norm1.bias", "model_down.layers.2.blocks.1.attn.relative_position_bias_table", "model_down.layers.2.blocks.1.attn.relative_position_index", "model_down.layers.2.blocks.1.attn.qkv.weight", "model_down.layers.2.blocks.1.attn.qkv.bias", "model_down.layers.2.blocks.1.attn.proj.weight", "model_down.layers.2.blocks.1.attn.proj.bias", "model_down.layers.2.blocks.1.norm2.weight", "model_down.layers.2.blocks.1.norm2.bias", "model_down.layers.2.blocks.1.mlp.fc1.weight", "model_down.layers.2.blocks.1.mlp.fc1.bias", "model_down.layers.2.blocks.1.mlp.fc2.weight", "model_down.layers.2.blocks.1.mlp.fc2.bias", "model_down.layers.2.downsample.reduction.weight", "model_down.layers.2.downsample.reduction.bias", "model_down.layers.2.downsample.norm.weight", "model_down.layers.2.downsample.norm.bias", "model_down.layers.3.blocks.0.norm1.weight", "model_down.layers.3.blocks.0.norm1.bias", "model_down.layers.3.blocks.0.attn.relative_position_bias_table", "model_down.layers.3.blocks.0.attn.relative_position_index", "model_down.layers.3.blocks.0.attn.qkv.weight", "model_down.layers.3.blocks.0.attn.qkv.bias", "model_down.layers.3.blocks.0.attn.proj.weight", "model_down.layers.3.blocks.0.attn.proj.bias", "model_down.layers.3.blocks.0.norm2.weight", "model_down.layers.3.blocks.0.norm2.bias", "model_down.layers.3.blocks.0.mlp.fc1.weight", "model_down.layers.3.blocks.0.mlp.fc1.bias", "model_down.layers.3.blocks.0.mlp.fc2.weight", "model_down.layers.3.blocks.0.mlp.fc2.bias", "model_down.layers.3.blocks.1.norm1.weight", "model_down.layers.3.blocks.1.norm1.bias", "model_down.layers.3.blocks.1.attn.relative_position_bias_table", "model_down.layers.3.blocks.1.attn.relative_position_index", "model_down.layers.3.blocks.1.attn.qkv.weight", "model_down.layers.3.blocks.1.attn.qkv.bias", "model_down.layers.3.blocks.1.attn.proj.weight", "model_down.layers.3.blocks.1.attn.proj.bias", "model_down.layers.3.blocks.1.norm2.weight", "model_down.layers.3.blocks.1.norm2.bias", "model_down.layers.3.blocks.1.mlp.fc1.weight", "model_down.layers.3.blocks.1.mlp.fc1.bias", "model_down.layers.3.blocks.1.mlp.fc2.weight", "model_down.layers.3.blocks.1.mlp.fc2.bias", "model_down.norm0.weight", "model_down.norm0.bias", "model_down.norm1.weight", "model_down.norm1.bias", "model_down.norm2.weight", "model_down.norm2.bias", "model_down.norm3.weight", "model_down.norm3.bias", "decoder.layers.0.blocks.0.norm1.weight", "decoder.layers.0.blocks.0.norm1.bias", "decoder.layers.0.blocks.0.attn.relative_position_bias_table", "decoder.layers.0.blocks.0.attn.relative_position_index", "decoder.layers.0.blocks.0.attn.kv.weight", "decoder.layers.0.blocks.0.attn.kv.bias", "decoder.layers.0.blocks.0.attn.proj.weight", "decoder.layers.0.blocks.0.attn.proj.bias", "decoder.layers.0.blocks.0.norm2.weight", "decoder.layers.0.blocks.0.norm2.bias", "decoder.layers.0.blocks.0.mlp.fc1.weight", "decoder.layers.0.blocks.0.mlp.fc1.bias", "decoder.layers.0.blocks.0.mlp.fc2.weight", "decoder.layers.0.blocks.0.mlp.fc2.bias", "decoder.layers.0.blocks.1.norm1.weight", "decoder.layers.0.blocks.1.norm1.bias", "decoder.layers.0.blocks.1.attn.relative_position_bias_table", "decoder.layers.0.blocks.1.attn.relative_position_index", "decoder.layers.0.blocks.1.attn.qkv.weight", "decoder.layers.0.blocks.1.attn.qkv.bias", "decoder.layers.0.blocks.1.attn.proj.weight", "decoder.layers.0.blocks.1.attn.proj.bias", "decoder.layers.0.blocks.1.norm2.weight", "decoder.layers.0.blocks.1.norm2.bias", "decoder.layers.0.blocks.1.mlp.fc1.weight", "decoder.layers.0.blocks.1.mlp.fc1.bias", "decoder.layers.0.blocks.1.mlp.fc2.weight", "decoder.layers.0.blocks.1.mlp.fc2.bias", "decoder.layers.0.Upsample.norm.weight", "decoder.layers.0.Upsample.norm.bias", "decoder.layers.0.Upsample.up.weight", "decoder.layers.0.Upsample.up.bias", "decoder.layers.1.blocks.0.norm1.weight", "decoder.layers.1.blocks.0.norm1.bias", "decoder.layers.1.blocks.0.attn.relative_position_bias_table", "decoder.layers.1.blocks.0.attn.relative_position_index", "decoder.layers.1.blocks.0.attn.kv.weight", "decoder.layers.1.blocks.0.attn.kv.bias", "decoder.layers.1.blocks.0.attn.proj.weight", "decoder.layers.1.blocks.0.attn.proj.bias", "decoder.layers.1.blocks.0.norm2.weight", "decoder.layers.1.blocks.0.norm2.bias", "decoder.layers.1.blocks.0.mlp.fc1.weight", "decoder.layers.1.blocks.0.mlp.fc1.bias", "decoder.layers.1.blocks.0.mlp.fc2.weight", "decoder.layers.1.blocks.0.mlp.fc2.bias", "decoder.layers.1.blocks.1.norm1.weight", "decoder.layers.1.blocks.1.norm1.bias", "decoder.layers.1.blocks.1.attn.relative_position_bias_table", "decoder.layers.1.blocks.1.attn.relative_position_index", "decoder.layers.1.blocks.1.attn.qkv.weight", "decoder.layers.1.blocks.1.attn.qkv.bias", "decoder.layers.1.blocks.1.attn.proj.weight", "decoder.layers.1.blocks.1.attn.proj.bias", "decoder.layers.1.blocks.1.norm2.weight", "decoder.layers.1.blocks.1.norm2.bias", "decoder.layers.1.blocks.1.mlp.fc1.weight", "decoder.layers.1.blocks.1.mlp.fc1.bias", "decoder.layers.1.blocks.1.mlp.fc2.weight", "decoder.layers.1.blocks.1.mlp.fc2.bias", "decoder.layers.1.Upsample.norm.weight", "decoder.layers.1.Upsample.norm.bias", "decoder.layers.1.Upsample.up.weight", "decoder.layers.1.Upsample.up.bias", "decoder.layers.2.blocks.0.norm1.weight", "decoder.layers.2.blocks.0.norm1.bias", "decoder.layers.2.blocks.0.attn.relative_position_bias_table", "decoder.layers.2.blocks.0.attn.relative_position_index", "decoder.layers.2.blocks.0.attn.kv.weight", "decoder.layers.2.blocks.0.attn.kv.bias", "decoder.layers.2.blocks.0.attn.proj.weight", "decoder.layers.2.blocks.0.attn.proj.bias", "decoder.layers.2.blocks.0.norm2.weight", "decoder.layers.2.blocks.0.norm2.bias", "decoder.layers.2.blocks.0.mlp.fc1.weight", "decoder.layers.2.blocks.0.mlp.fc1.bias", "decoder.layers.2.blocks.0.mlp.fc2.weight", "decoder.layers.2.blocks.0.mlp.fc2.bias", "decoder.layers.2.blocks.1.norm1.weight", "decoder.layers.2.blocks.1.norm1.bias", "decoder.layers.2.blocks.1.attn.relative_position_bias_table", "decoder.layers.2.blocks.1.attn.relative_position_index", "decoder.layers.2.blocks.1.attn.qkv.weight", "decoder.layers.2.blocks.1.attn.qkv.bias", "decoder.layers.2.blocks.1.attn.proj.weight", "decoder.layers.2.blocks.1.attn.proj.bias", "decoder.layers.2.blocks.1.norm2.weight", "decoder.layers.2.blocks.1.norm2.bias", "decoder.layers.2.blocks.1.mlp.fc1.weight", "decoder.layers.2.blocks.1.mlp.fc1.bias", "decoder.layers.2.blocks.1.mlp.fc2.weight", "decoder.layers.2.blocks.1.mlp.fc2.bias", "decoder.layers.2.Upsample.norm.weight", "decoder.layers.2.Upsample.norm.bias", "decoder.layers.2.Upsample.up.weight", "decoder.layers.2.Upsample.up.bias", "final.0.up.weight", "final.0.up.bias", "final.1.up.weight", "final.1.up.bias", "final.2.up.weight", "final.2.up.bias". 
  Unexpected key(s) in state_dict: "model". 

Could you provide the pre-trained model file gelunorm_former_skip_global_shift.model?

did you get any solution for pre-trained model ?

OCEANOUXIN commented 8 months ago

Since the default embedding size is 192, the checkpoint file of the large swin model swin_large_patch4_window7_224_22k.pth is used to initialize the nnFormor's weights. The following errors are occurred during the initialization.

RuntimeError: Error(s) in loading state_dict for nnFormer:
  Missing key(s) in state_dict: "model_down.patch_embed.proj1.conv1.weight", "model_down.patch_embed.proj1.conv1.bias", "model_down.patch_embed.proj1.conv2.weight", "model_down.patch_embed.proj1.conv2.bias", "model_down.patch_embed.proj1.norm1.weight", "model_down.patch_embed.proj1.norm1.bias", "model_down.patch_embed.proj1.norm2.weight", "model_down.patch_embed.proj1.norm2.bias", "model_down.patch_embed.proj2.conv1.weight", "model_down.patch_embed.proj2.conv1.bias", "model_down.patch_embed.proj2.conv2.weight", "model_down.patch_embed.proj2.conv2.bias", "model_down.patch_embed.proj2.norm1.weight", "model_down.patch_embed.proj2.norm1.bias", "model_down.patch_embed.norm.weight", "model_down.patch_embed.norm.bias", "model_down.layers.0.blocks.0.norm1.weight", "model_down.layers.0.blocks.0.norm1.bias", "model_down.layers.0.blocks.0.attn.relative_position_bias_table", "model_down.layers.0.blocks.0.attn.relative_position_index", "model_down.layers.0.blocks.0.attn.qkv.weight", "model_down.layers.0.blocks.0.attn.qkv.bias", "model_down.layers.0.blocks.0.attn.proj.weight", "model_down.layers.0.blocks.0.attn.proj.bias", "model_down.layers.0.blocks.0.norm2.weight", "model_down.layers.0.blocks.0.norm2.bias", "model_down.layers.0.blocks.0.mlp.fc1.weight", "model_down.layers.0.blocks.0.mlp.fc1.bias", "model_down.layers.0.blocks.0.mlp.fc2.weight", "model_down.layers.0.blocks.0.mlp.fc2.bias", "model_down.layers.0.blocks.1.norm1.weight", "model_down.layers.0.blocks.1.norm1.bias", "model_down.layers.0.blocks.1.attn.relative_position_bias_table", "model_down.layers.0.blocks.1.attn.relative_position_index", "model_down.layers.0.blocks.1.attn.qkv.weight", "model_down.layers.0.blocks.1.attn.qkv.bias", "model_down.layers.0.blocks.1.attn.proj.weight", "model_down.layers.0.blocks.1.attn.proj.bias", "model_down.layers.0.blocks.1.norm2.weight", "model_down.layers.0.blocks.1.norm2.bias", "model_down.layers.0.blocks.1.mlp.fc1.weight", "model_down.layers.0.blocks.1.mlp.fc1.bias", "model_down.layers.0.blocks.1.mlp.fc2.weight", "model_down.layers.0.blocks.1.mlp.fc2.bias", "model_down.layers.0.downsample.reduction.weight", "model_down.layers.0.downsample.reduction.bias", "model_down.layers.0.downsample.norm.weight", "model_down.layers.0.downsample.norm.bias", "model_down.layers.1.blocks.0.norm1.weight", "model_down.layers.1.blocks.0.norm1.bias", "model_down.layers.1.blocks.0.attn.relative_position_bias_table", "model_down.layers.1.blocks.0.attn.relative_position_index", "model_down.layers.1.blocks.0.attn.qkv.weight", "model_down.layers.1.blocks.0.attn.qkv.bias", "model_down.layers.1.blocks.0.attn.proj.weight", "model_down.layers.1.blocks.0.attn.proj.bias", "model_down.layers.1.blocks.0.norm2.weight", "model_down.layers.1.blocks.0.norm2.bias", "model_down.layers.1.blocks.0.mlp.fc1.weight", "model_down.layers.1.blocks.0.mlp.fc1.bias", "model_down.layers.1.blocks.0.mlp.fc2.weight", "model_down.layers.1.blocks.0.mlp.fc2.bias", "model_down.layers.1.blocks.1.norm1.weight", "model_down.layers.1.blocks.1.norm1.bias", "model_down.layers.1.blocks.1.attn.relative_position_bias_table", "model_down.layers.1.blocks.1.attn.relative_position_index", "model_down.layers.1.blocks.1.attn.qkv.weight", "model_down.layers.1.blocks.1.attn.qkv.bias", "model_down.layers.1.blocks.1.attn.proj.weight", "model_down.layers.1.blocks.1.attn.proj.bias", "model_down.layers.1.blocks.1.norm2.weight", "model_down.layers.1.blocks.1.norm2.bias", "model_down.layers.1.blocks.1.mlp.fc1.weight", "model_down.layers.1.blocks.1.mlp.fc1.bias", "model_down.layers.1.blocks.1.mlp.fc2.weight", "model_down.layers.1.blocks.1.mlp.fc2.bias", "model_down.layers.1.downsample.reduction.weight", "model_down.layers.1.downsample.reduction.bias", "model_down.layers.1.downsample.norm.weight", "model_down.layers.1.downsample.norm.bias", "model_down.layers.2.blocks.0.norm1.weight", "model_down.layers.2.blocks.0.norm1.bias", "model_down.layers.2.blocks.0.attn.relative_position_bias_table", "model_down.layers.2.blocks.0.attn.relative_position_index", "model_down.layers.2.blocks.0.attn.qkv.weight", "model_down.layers.2.blocks.0.attn.qkv.bias", "model_down.layers.2.blocks.0.attn.proj.weight", "model_down.layers.2.blocks.0.attn.proj.bias", "model_down.layers.2.blocks.0.norm2.weight", "model_down.layers.2.blocks.0.norm2.bias", "model_down.layers.2.blocks.0.mlp.fc1.weight", "model_down.layers.2.blocks.0.mlp.fc1.bias", "model_down.layers.2.blocks.0.mlp.fc2.weight", "model_down.layers.2.blocks.0.mlp.fc2.bias", "model_down.layers.2.blocks.1.norm1.weight", "model_down.layers.2.blocks.1.norm1.bias", "model_down.layers.2.blocks.1.attn.relative_position_bias_table", "model_down.layers.2.blocks.1.attn.relative_position_index", "model_down.layers.2.blocks.1.attn.qkv.weight", "model_down.layers.2.blocks.1.attn.qkv.bias", "model_down.layers.2.blocks.1.attn.proj.weight", "model_down.layers.2.blocks.1.attn.proj.bias", "model_down.layers.2.blocks.1.norm2.weight", "model_down.layers.2.blocks.1.norm2.bias", "model_down.layers.2.blocks.1.mlp.fc1.weight", "model_down.layers.2.blocks.1.mlp.fc1.bias", "model_down.layers.2.blocks.1.mlp.fc2.weight", "model_down.layers.2.blocks.1.mlp.fc2.bias", "model_down.layers.2.downsample.reduction.weight", "model_down.layers.2.downsample.reduction.bias", "model_down.layers.2.downsample.norm.weight", "model_down.layers.2.downsample.norm.bias", "model_down.layers.3.blocks.0.norm1.weight", "model_down.layers.3.blocks.0.norm1.bias", "model_down.layers.3.blocks.0.attn.relative_position_bias_table", "model_down.layers.3.blocks.0.attn.relative_position_index", "model_down.layers.3.blocks.0.attn.qkv.weight", "model_down.layers.3.blocks.0.attn.qkv.bias", "model_down.layers.3.blocks.0.attn.proj.weight", "model_down.layers.3.blocks.0.attn.proj.bias", "model_down.layers.3.blocks.0.norm2.weight", "model_down.layers.3.blocks.0.norm2.bias", "model_down.layers.3.blocks.0.mlp.fc1.weight", "model_down.layers.3.blocks.0.mlp.fc1.bias", "model_down.layers.3.blocks.0.mlp.fc2.weight", "model_down.layers.3.blocks.0.mlp.fc2.bias", "model_down.layers.3.blocks.1.norm1.weight", "model_down.layers.3.blocks.1.norm1.bias", "model_down.layers.3.blocks.1.attn.relative_position_bias_table", "model_down.layers.3.blocks.1.attn.relative_position_index", "model_down.layers.3.blocks.1.attn.qkv.weight", "model_down.layers.3.blocks.1.attn.qkv.bias", "model_down.layers.3.blocks.1.attn.proj.weight", "model_down.layers.3.blocks.1.attn.proj.bias", "model_down.layers.3.blocks.1.norm2.weight", "model_down.layers.3.blocks.1.norm2.bias", "model_down.layers.3.blocks.1.mlp.fc1.weight", "model_down.layers.3.blocks.1.mlp.fc1.bias", "model_down.layers.3.blocks.1.mlp.fc2.weight", "model_down.layers.3.blocks.1.mlp.fc2.bias", "model_down.norm0.weight", "model_down.norm0.bias", "model_down.norm1.weight", "model_down.norm1.bias", "model_down.norm2.weight", "model_down.norm2.bias", "model_down.norm3.weight", "model_down.norm3.bias", "decoder.layers.0.blocks.0.norm1.weight", "decoder.layers.0.blocks.0.norm1.bias", "decoder.layers.0.blocks.0.attn.relative_position_bias_table", "decoder.layers.0.blocks.0.attn.relative_position_index", "decoder.layers.0.blocks.0.attn.kv.weight", "decoder.layers.0.blocks.0.attn.kv.bias", "decoder.layers.0.blocks.0.attn.proj.weight", "decoder.layers.0.blocks.0.attn.proj.bias", "decoder.layers.0.blocks.0.norm2.weight", "decoder.layers.0.blocks.0.norm2.bias", "decoder.layers.0.blocks.0.mlp.fc1.weight", "decoder.layers.0.blocks.0.mlp.fc1.bias", "decoder.layers.0.blocks.0.mlp.fc2.weight", "decoder.layers.0.blocks.0.mlp.fc2.bias", "decoder.layers.0.blocks.1.norm1.weight", "decoder.layers.0.blocks.1.norm1.bias", "decoder.layers.0.blocks.1.attn.relative_position_bias_table", "decoder.layers.0.blocks.1.attn.relative_position_index", "decoder.layers.0.blocks.1.attn.qkv.weight", "decoder.layers.0.blocks.1.attn.qkv.bias", "decoder.layers.0.blocks.1.attn.proj.weight", "decoder.layers.0.blocks.1.attn.proj.bias", "decoder.layers.0.blocks.1.norm2.weight", "decoder.layers.0.blocks.1.norm2.bias", "decoder.layers.0.blocks.1.mlp.fc1.weight", "decoder.layers.0.blocks.1.mlp.fc1.bias", "decoder.layers.0.blocks.1.mlp.fc2.weight", "decoder.layers.0.blocks.1.mlp.fc2.bias", "decoder.layers.0.Upsample.norm.weight", "decoder.layers.0.Upsample.norm.bias", "decoder.layers.0.Upsample.up.weight", "decoder.layers.0.Upsample.up.bias", "decoder.layers.1.blocks.0.norm1.weight", "decoder.layers.1.blocks.0.norm1.bias", "decoder.layers.1.blocks.0.attn.relative_position_bias_table", "decoder.layers.1.blocks.0.attn.relative_position_index", "decoder.layers.1.blocks.0.attn.kv.weight", "decoder.layers.1.blocks.0.attn.kv.bias", "decoder.layers.1.blocks.0.attn.proj.weight", "decoder.layers.1.blocks.0.attn.proj.bias", "decoder.layers.1.blocks.0.norm2.weight", "decoder.layers.1.blocks.0.norm2.bias", "decoder.layers.1.blocks.0.mlp.fc1.weight", "decoder.layers.1.blocks.0.mlp.fc1.bias", "decoder.layers.1.blocks.0.mlp.fc2.weight", "decoder.layers.1.blocks.0.mlp.fc2.bias", "decoder.layers.1.blocks.1.norm1.weight", "decoder.layers.1.blocks.1.norm1.bias", "decoder.layers.1.blocks.1.attn.relative_position_bias_table", "decoder.layers.1.blocks.1.attn.relative_position_index", "decoder.layers.1.blocks.1.attn.qkv.weight", "decoder.layers.1.blocks.1.attn.qkv.bias", "decoder.layers.1.blocks.1.attn.proj.weight", "decoder.layers.1.blocks.1.attn.proj.bias", "decoder.layers.1.blocks.1.norm2.weight", "decoder.layers.1.blocks.1.norm2.bias", "decoder.layers.1.blocks.1.mlp.fc1.weight", "decoder.layers.1.blocks.1.mlp.fc1.bias", "decoder.layers.1.blocks.1.mlp.fc2.weight", "decoder.layers.1.blocks.1.mlp.fc2.bias", "decoder.layers.1.Upsample.norm.weight", "decoder.layers.1.Upsample.norm.bias", "decoder.layers.1.Upsample.up.weight", "decoder.layers.1.Upsample.up.bias", "decoder.layers.2.blocks.0.norm1.weight", "decoder.layers.2.blocks.0.norm1.bias", "decoder.layers.2.blocks.0.attn.relative_position_bias_table", "decoder.layers.2.blocks.0.attn.relative_position_index", "decoder.layers.2.blocks.0.attn.kv.weight", "decoder.layers.2.blocks.0.attn.kv.bias", "decoder.layers.2.blocks.0.attn.proj.weight", "decoder.layers.2.blocks.0.attn.proj.bias", "decoder.layers.2.blocks.0.norm2.weight", "decoder.layers.2.blocks.0.norm2.bias", "decoder.layers.2.blocks.0.mlp.fc1.weight", "decoder.layers.2.blocks.0.mlp.fc1.bias", "decoder.layers.2.blocks.0.mlp.fc2.weight", "decoder.layers.2.blocks.0.mlp.fc2.bias", "decoder.layers.2.blocks.1.norm1.weight", "decoder.layers.2.blocks.1.norm1.bias", "decoder.layers.2.blocks.1.attn.relative_position_bias_table", "decoder.layers.2.blocks.1.attn.relative_position_index", "decoder.layers.2.blocks.1.attn.qkv.weight", "decoder.layers.2.blocks.1.attn.qkv.bias", "decoder.layers.2.blocks.1.attn.proj.weight", "decoder.layers.2.blocks.1.attn.proj.bias", "decoder.layers.2.blocks.1.norm2.weight", "decoder.layers.2.blocks.1.norm2.bias", "decoder.layers.2.blocks.1.mlp.fc1.weight", "decoder.layers.2.blocks.1.mlp.fc1.bias", "decoder.layers.2.blocks.1.mlp.fc2.weight", "decoder.layers.2.blocks.1.mlp.fc2.bias", "decoder.layers.2.Upsample.norm.weight", "decoder.layers.2.Upsample.norm.bias", "decoder.layers.2.Upsample.up.weight", "decoder.layers.2.Upsample.up.bias", "final.0.up.weight", "final.0.up.bias", "final.1.up.weight", "final.1.up.bias", "final.2.up.weight", "final.2.up.bias". 
  Unexpected key(s) in state_dict: "model". 

Could you provide the pre-trained model file gelunorm_former_skip_global_shift.model?

hi,do you get the pre-trained weights _gelunorm_former_skip_globalshift.model?