MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.16k stars 134 forks source link

Demo Script for Image classification #46

Open sivaji123256 opened 8 months ago

sivaji123256 commented 8 months ago

Thanks for the great work.I was looking into how we can actually interpret the image classification results.First of all, how can I create a simple demo script where I can load a single/ folder of images to predict the image classification results? For interpretation , Can we Use GradCAM like how we use for ViT and for SSM , is it a nice idea to look into latent space? Any suggestions in this direction would highly helpful to look into the interpretability and explanability of VMamba model. Thanks in advance.

sivaji123256 commented 8 months ago

Hi @MzeroMiko , Do you have any updates on this query? Thanks again.

MzeroMiko commented 8 months ago

Thank you for your attention. I am sorry that the model is still undergoing a great change, and we have no plan to release a demo now. But once the model structure and parameters are fixed, we would take this into account again. We had tried GradCAM before writing the arxiv version of paper, but shows no valuable information. It may be due to some unknown mistakes when I perform the GradCAM. Also, we'll try this in the future. By the way, the feature maps come from the output of selective_scan is quite interesting, in which you can see the different activations comes from the four-way scan mechanism. We may dig deeper once the model structure being fixed.

sivaji123256 commented 7 months ago

Hi @MzeroMiko , I am looking to visualize the featuremaps at the outputs of the selective_scan as well across the layers in the model using preatrined imagenet weights. What is the simplest modifications in the model script or utils that I can try? Thanks in advance.

MzeroMiko commented 7 months ago

TS-CAM or GradCAM can be used to visualize the output of selective scan function. And these are plug in methods which need no modification.

Another approach is to visualize what happened inside selective scan. And we are working on it already. We may show the details of that on the next version of arxiv paper of vmamba, so you can keep watching.

sivaji123256 commented 7 months ago

Hi @MzeroMiko , I was trying to load the pretrained weights and I am facing the following issue i.e mismatched keys.Can you pls suggest how I can fix this?

import torch def load_custom_model(model_path):

Instantiate the model class defined in vmamba.py

model = VSSM(forward_type="v2")

# Load the weights from the checkpoint file
checkpoint = torch.load(model_path, map_location='cpu')
print("Keys in the checkpoint dictionary:", checkpoint.keys())
model.load_state_dict(checkpoint['model'])  # Adjust this based on your checkpoint structure

return model

Example usage:

model_path = '/home/ubuntu/VMamba/classification/pretrained/vssm_base_0229_ckpt_epoch_237.pth' model = load_custom_model(model_path)

Error :

RuntimeError Traceback (most recent call last) Cell In[24], line 14 12 # Example usage: 13 model_path = '/home/ubuntu/VMamba/classification/pretrained/vssm_base_0229_ckpt_epoch_237.pth' ---> 14 model = load_custom_model(model_path)

Cell In[24], line 9, in load_custom_model(model_path) 7 checkpoint = torch.load(model_path, map_location='cpu') 8 print("Keys in the checkpoint dictionary:", checkpoint.keys()) ----> 9 model.load_state_dict(checkpoint['model']) # Adjust this based on your checkpoint structure 11 return model

File /opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152, in Module.load_state_dict(self, state_dict, strict, assign) 2147 error_msgs.insert( 2148 0, 'Missing key(s) in state_dict: {}. '.format( 2149 ', '.join(f'"{k}"' for k in missing_keys))) 2151 if len(error_msgs) > 0: -> 2152 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2153 self.class.name, "\n\t".join(error_msgs))) 2154 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for VSSM: Missing key(s) in state_dict: "layers.0.blocks.0.op.conv2d.bias", "layers.0.blocks.1.op.conv2d.bias", "layers.1.blocks.0.op.conv2d.bias", "layers.1.blocks.1.op.conv2d.bias", "layers.2.blocks.0.op.conv2d.bias", "layers.2.blocks.1.op.conv2d.bias", "layers.2.blocks.2.op.conv2d.bias", "layers.2.blocks.3.op.conv2d.bias", "layers.2.blocks.4.op.conv2d.bias", "layers.2.blocks.5.op.conv2d.bias", "layers.2.blocks.6.op.conv2d.bias", "layers.2.blocks.7.op.conv2d.bias", "layers.2.blocks.8.op.conv2d.bias", "layers.3.blocks.0.op.conv2d.bias", "layers.3.blocks.1.op.conv2d.bias". Unexpected key(s) in state_dict: "patch_embed.5.weight", "patch_embed.5.bias", "patch_embed.7.weight", "patch_embed.7.bias", "layers.2.blocks.9.norm.weight", "layers.2.blocks.9.norm.bias", "layers.2.blocks.9.op.x_proj_weight", "layers.2.blocks.9.op.dt_projs_weight", "layers.2.blocks.9.op.dt_projs_bias", "layers.2.blocks.9.op.A_logs", "layers.2.blocks.9.op.Ds", "layers.2.blocks.9.op.out_norm.weight", "layers.2.blocks.9.op.out_norm.bias", "layers.2.blocks.9.op.in_proj.weight", "layers.2.blocks.9.op.conv2d.weight", "layers.2.blocks.9.op.out_proj.weight", "layers.2.blocks.9.norm2.weight", "layers.2.blocks.9.norm2.bias", "layers.2.blocks.9.mlp.fc1.weight", "layers.2.blocks.9.mlp.fc1.bias", "layers.2.blocks.9.mlp.fc2.weight", "layers.2.blocks.9.mlp.fc2.bias", "layers.2.blocks.10.norm.weight", "layers.2.blocks.10.norm.bias", "layers.2.blocks.10.op.x_proj_weight", "layers.2.blocks.10.op.dt_projs_weight", "layers.2.blocks.10.op.dt_projs_bias", "layers.2.blocks.10.op.A_logs", "layers.2.blocks.10.op.Ds", "layers.2.blocks.10.op.out_norm.weight", "layers.2.blocks.10.op.out_norm.bias", "layers.2.blocks.10.op.in_proj.weight", "layers.2.blocks.10.op.conv2d.weight", "layers.2.blocks.10.op.out_proj.weight", "layers.2.blocks.10.norm2.weight", "layers.2.blocks.10.norm2.bias", "layers.2.blocks.10.mlp.fc1.weight", "layers.2.blocks.10.mlp.fc1.bias", "layers.2.blocks.10.mlp.fc2.weight", "layers.2.blocks.10.mlp.fc2.bias", "layers.2.blocks.11.norm.weight", "layers.2.blocks.11.norm.bias", "layers.2.blocks.11.op.x_proj_weight", "layers.2.blocks.11.op.dt_projs_weight", "layers.2.blocks.11.op.dt_projs_bias", "layers.2.blocks.11.op.A_logs", "layers.2.blocks.11.op.Ds", "layers.2.blocks.11.op.out_norm.weight", "layers.2.blocks.11.op.out_norm.bias", "layers.2.blocks.11.op.in_proj.weight", "layers.2.blocks.11.op.conv2d.weight", "layers.2.blocks.11.op.out_proj.weight", "layers.2.blocks.11.norm2.weight", "layers.2.blocks.11.norm2.bias", "layers.2.blocks.11.mlp.fc1.weight", "layers.2.blocks.11.mlp.fc1.bias", "layers.2.blocks.11.mlp.fc2.weight", "layers.2.blocks.11.mlp.fc2.bias", "layers.2.blocks.12.norm.weight", "layers.2.blocks.12.norm.bias", "layers.2.blocks.12.op.x_proj_weight", "layers.2.blocks.12.op.dt_projs_weight", "layers.2.blocks.12.op.dt_projs_bias", "layers.2.blocks.12.op.A_logs", "layers.2.blocks.12.op.Ds", "layers.2.blocks.12.op.out_norm.weight", "layers.2.blocks.12.op.out_norm.bias", "layers.2.blocks.12.op.in_proj.weight", "layers.2.blocks.12.op.conv2d.weight", "layers.2.blocks.12.op.out_proj.weight", "layers.2.blocks.12.norm2.weight", "layers.2.blocks.12.norm2.bias", "layers.2.blocks.12.mlp.fc1.weight", "layers.2.blocks.12.mlp.fc1.bias", "layers.2.blocks.12.mlp.fc2.weight", "layers.2.blocks.12.mlp.fc2.bias", "layers.2.blocks.13.norm.weight", "layers.2.blocks.13.norm.bias", "layers.2.blocks.13.op.x_proj_weight", "layers.2.blocks.13.op.dt_projs_weight", "layers.2.blocks.13.op.dt_projs_bias", "layers.2.blocks.13.op.A_logs", "layers.2.blocks.13.op.Ds", "layers.2.blocks.13.op.out_norm.weight", "layers.2.blocks.13.op.out_norm.bias", "layers.2.blocks.13.op.in_proj.weight", "layers.2.blocks.13.op.conv2d.weight", "layers.2.blocks.13.op.out_proj.weight", "layers.2.blocks.13.norm2.weight", "layers.2.blocks.13.norm2.bias", "layers.2.blocks.13.mlp.fc1.weight", "layers.2.blocks.13.mlp.fc1.bias", "layers.2.blocks.13.mlp.fc2.weight", "layers.2.blocks.13.mlp.fc2.bias", "layers.2.blocks.14.norm.weight", "layers.2.blocks.14.norm.bias", "layers.2.blocks.14.op.x_proj_weight", "layers.2.blocks.14.op.dt_projs_weight", "layers.2.blocks.14.op.dt_projs_bias", "layers.2.blocks.14.op.A_logs", "layers.2.blocks.14.op.Ds", "layers.2.blocks.14.op.out_norm.weight", "layers.2.blocks.14.op.out_norm.bias", "layers.2.blocks.14.op.in_proj.weight", "layers.2.blocks.14.op.conv2d.weight", "layers.2.blocks.14.op.out_proj.weight", "layers.2.blocks.14.norm2.weight", "layers.2.blocks.14.norm2.bias", "layers.2.blocks.14.mlp.fc1.weight", "layers.2.blocks.14.mlp.fc1.bias", "layers.2.blocks.14.mlp.fc2.weight", "layers.2.blocks.14.mlp.fc2.bias". size mismatch for patch_embed.0.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 3, 4, 4]). size mismatch for patch_embed.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for patch_embed.2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for patch_embed.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 10, 256]) from checkpoint, the shape in current model is torch.Size([4, 38, 192]). size mismatch for layers.0.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 256, 8]) from checkpoint, the shape in current model is torch.Size([4, 192, 6]). size mismatch for layers.0.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 256]) from checkpoint, the shape in current model is torch.Size([4, 192]). size mismatch for layers.0.blocks.0.op.A_logs: copying a param with shape torch.Size([1024, 1]) from checkpoint, the shape in current model is torch.Size([768, 16]). size mismatch for layers.0.blocks.0.op.Ds: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.0.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 1, 3, 3]). size mismatch for layers.0.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([96, 192]). size mismatch for layers.0.blocks.0.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.0.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]). size mismatch for layers.0.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 10, 256]) from checkpoint, the shape in current model is torch.Size([4, 38, 192]). size mismatch for layers.0.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 256, 8]) from checkpoint, the shape in current model is torch.Size([4, 192, 6]). size mismatch for layers.0.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 256]) from checkpoint, the shape in current model is torch.Size([4, 192]). size mismatch for layers.0.blocks.1.op.A_logs: copying a param with shape torch.Size([1024, 1]) from checkpoint, the shape in current model is torch.Size([768, 16]). size mismatch for layers.0.blocks.1.op.Ds: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.0.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 1, 3, 3]). size mismatch for layers.0.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([96, 192]). size mismatch for layers.0.blocks.1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.0.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]). size mismatch for layers.0.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.downsample.1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 96, 2, 2]). size mismatch for layers.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.downsample.3.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.downsample.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 18, 512]) from checkpoint, the shape in current model is torch.Size([4, 44, 384]). size mismatch for layers.1.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 512, 16]) from checkpoint, the shape in current model is torch.Size([4, 384, 12]). size mismatch for layers.1.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([4, 384]). size mismatch for layers.1.blocks.0.op.A_logs: copying a param with shape torch.Size([2048, 1]) from checkpoint, the shape in current model is torch.Size([1536, 16]). size mismatch for layers.1.blocks.0.op.Ds: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.1.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([512, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([512, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 1, 3, 3]). size mismatch for layers.1.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([192, 384]). size mismatch for layers.1.blocks.0.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.1.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]). size mismatch for layers.1.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 18, 512]) from checkpoint, the shape in current model is torch.Size([4, 44, 384]). size mismatch for layers.1.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 512, 16]) from checkpoint, the shape in current model is torch.Size([4, 384, 12]). size mismatch for layers.1.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([4, 384]). size mismatch for layers.1.blocks.1.op.A_logs: copying a param with shape torch.Size([2048, 1]) from checkpoint, the shape in current model is torch.Size([1536, 16]). size mismatch for layers.1.blocks.1.op.Ds: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.1.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([512, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([512, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 1, 3, 3]). size mismatch for layers.1.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([192, 384]). size mismatch for layers.1.blocks.1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.1.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]). size mismatch for layers.1.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.downsample.1.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 192, 2, 2]). size mismatch for layers.1.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.downsample.3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.downsample.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.0.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.0.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.0.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.1.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.1.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.1.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.2.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.2.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.2.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.2.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.2.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.2.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.2.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.2.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.2.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.2.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.2.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.2.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.2.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.3.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.3.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.3.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.3.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.3.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.3.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.3.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.3.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.3.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.3.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.3.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.3.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.3.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.4.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.4.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.4.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.4.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.4.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.4.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.4.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.4.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.4.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.4.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.4.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.4.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.4.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.5.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.5.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.5.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.5.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.5.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.5.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.5.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.5.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.5.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.5.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.5.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.5.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.6.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.6.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.6.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.6.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.6.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.6.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.6.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.6.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.6.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.6.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.6.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.6.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.6.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.7.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.7.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.7.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.7.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.7.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.7.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.7.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.7.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.7.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.7.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.7.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.7.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.7.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.8.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.8.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.8.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.8.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.8.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.8.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.8.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.8.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.8.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.8.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.8.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.8.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.8.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.downsample.1.weight: copying a param with shape torch.Size([1024, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 384, 2, 2]). size mismatch for layers.2.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.downsample.3.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.downsample.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 66, 2048]) from checkpoint, the shape in current model is torch.Size([4, 80, 1536]). size mismatch for layers.3.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 2048, 64]) from checkpoint, the shape in current model is torch.Size([4, 1536, 48]). size mismatch for layers.3.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 2048]) from checkpoint, the shape in current model is torch.Size([4, 1536]). size mismatch for layers.3.blocks.0.op.A_logs: copying a param with shape torch.Size([8192, 1]) from checkpoint, the shape in current model is torch.Size([6144, 16]). size mismatch for layers.3.blocks.0.op.Ds: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([6144]). size mismatch for layers.3.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([2048, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([1536, 1, 3, 3]). size mismatch for layers.3.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([768, 1536]). size mismatch for layers.3.blocks.0.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.3.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for layers.3.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 66, 2048]) from checkpoint, the shape in current model is torch.Size([4, 80, 1536]). size mismatch for layers.3.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 2048, 64]) from checkpoint, the shape in current model is torch.Size([4, 1536, 48]). size mismatch for layers.3.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 2048]) from checkpoint, the shape in current model is torch.Size([4, 1536]). size mismatch for layers.3.blocks.1.op.A_logs: copying a param with shape torch.Size([8192, 1]) from checkpoint, the shape in current model is torch.Size([6144, 16]). size mismatch for layers.3.blocks.1.op.Ds: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([6144]). size mismatch for layers.3.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([2048, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([1536, 1, 3, 3]). size mismatch for layers.3.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([768, 1536]). size mismatch for layers.3.blocks.1.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.3.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for layers.3.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.head.weight: copying a param with shape torch.Size([1000, 1024]) from checkpoint, the shape in current model is torch.Size([1000, 768]).

MzeroMiko commented 7 months ago

every checkpoint has its own corresponding configuration, try using the same config with related to the checkpoint and then try again.

sivaji123256 commented 7 months ago

Hi @MzeroMiko ,

import torch def load_custom_model(model_path): model = VSSM() checkpoint = torch.load(model_path, map_location='cpu',) print("Keys in the checkpoint dictionary:", checkpoint.keys()) model.load_state_dict(checkpoint['model'],strict = False) # Adjust this based on your checkpoint structure return model model_path = '/home/ubuntu/VMamba/classification/pretrained/vssm_base_0229_ckpt_epoch_237.pth' model = load_custom_model(model_path)

this is the code I was using to load the weigths in the model_path, with the folloing config , still I was facing the same issue.

MODEL: TYPE: vssm NAME: vssm1_base_0229 DROP_PATH_RATE: 0.6 VSSM: EMBED_DIM: 128 DEPTHS: [ 2, 2, 15, 2 ] SSM_D_STATE: 1 SSM_DT_RANK: "auto" SSM_RATIO: 2.0 SSM_CONV: 3 SSM_CONV_BIAS: false SSM_FORWARDTYPE: "v3noz" MLP_RATIO: 4.0 DOWNSAMPLE: "v3" PATCHEMBED: "v2"

89.0 + 15.2 + 118min/e + 48G