Hi @MzeroMiko , I was trying to load the pretrained weights and I am facing the following issue i.e mismatched keys.Can you pls suggest how I can fix this?

import torch def load_custom_model(model_path):

Instantiate the model class defined in vmamba.py

model = VSSM(forward_type="v2")

# Load the weights from the checkpoint file
checkpoint = torch.load(model_path, map_location='cpu')
print("Keys in the checkpoint dictionary:", checkpoint.keys())
model.load_state_dict(checkpoint['model'])  # Adjust this based on your checkpoint structure

return model

Example usage:

model_path = '/home/ubuntu/VMamba/classification/pretrained/vssm_base_0229_ckpt_epoch_237.pth' model = load_custom_model(model_path)

Error :

RuntimeError Traceback (most recent call last) Cell In[24], line 14 12 # Example usage: 13 model_path = '/home/ubuntu/VMamba/classification/pretrained/vssm_base_0229_ckpt_epoch_237.pth' ---> 14 model = load_custom_model(model_path)

Cell In[24], line 9, in load_custom_model(model_path) 7 checkpoint = torch.load(model_path, map_location='cpu') 8 print("Keys in the checkpoint dictionary:", checkpoint.keys()) ----> 9 model.load_state_dict(checkpoint['model']) # Adjust this based on your checkpoint structure 11 return model

File /opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152, in Module.load_state_dict(self, state_dict, strict, assign) 2147 error_msgs.insert( 2148 0, 'Missing key(s) in state_dict: {}. '.format( 2149 ', '.join(f'"{k}"' for k in missing_keys))) 2151 if len(error_msgs) > 0: -> 2152 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2153 self.class.name, "\n\t".join(error_msgs))) 2154 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for VSSM: Missing key(s) in state_dict: "layers.0.blocks.0.op.conv2d.bias", "layers.0.blocks.1.op.conv2d.bias", "layers.1.blocks.0.op.conv2d.bias", "layers.1.blocks.1.op.conv2d.bias", "layers.2.blocks.0.op.conv2d.bias", "layers.2.blocks.1.op.conv2d.bias", "layers.2.blocks.2.op.conv2d.bias", "layers.2.blocks.3.op.conv2d.bias", "layers.2.blocks.4.op.conv2d.bias", "layers.2.blocks.5.op.conv2d.bias", "layers.2.blocks.6.op.conv2d.bias", "layers.2.blocks.7.op.conv2d.bias", "layers.2.blocks.8.op.conv2d.bias", "layers.3.blocks.0.op.conv2d.bias", "layers.3.blocks.1.op.conv2d.bias". Unexpected key(s) in state_dict: "patch_embed.5.weight", "patch_embed.5.bias", "patch_embed.7.weight", "patch_embed.7.bias", "layers.2.blocks.9.norm.weight", "layers.2.blocks.9.norm.bias", "layers.2.blocks.9.op.x_proj_weight", "layers.2.blocks.9.op.dt_projs_weight", "layers.2.blocks.9.op.dt_projs_bias", "layers.2.blocks.9.op.A_logs", "layers.2.blocks.9.op.Ds", "layers.2.blocks.9.op.out_norm.weight", "layers.2.blocks.9.op.out_norm.bias", "layers.2.blocks.9.op.in_proj.weight", "layers.2.blocks.9.op.conv2d.weight", "layers.2.blocks.9.op.out_proj.weight", "layers.2.blocks.9.norm2.weight", "layers.2.blocks.9.norm2.bias", "layers.2.blocks.9.mlp.fc1.weight", "layers.2.blocks.9.mlp.fc1.bias", "layers.2.blocks.9.mlp.fc2.weight", "layers.2.blocks.9.mlp.fc2.bias", "layers.2.blocks.10.norm.weight", "layers.2.blocks.10.norm.bias", "layers.2.blocks.10.op.x_proj_weight", "layers.2.blocks.10.op.dt_projs_weight", "layers.2.blocks.10.op.dt_projs_bias", "layers.2.blocks.10.op.A_logs", "layers.2.blocks.10.op.Ds", "layers.2.blocks.10.op.out_norm.weight", "layers.2.blocks.10.op.out_norm.bias", "layers.2.blocks.10.op.in_proj.weight", "layers.2.blocks.10.op.conv2d.weight", "layers.2.blocks.10.op.out_proj.weight", "layers.2.blocks.10.norm2.weight", "layers.2.blocks.10.norm2.bias", "layers.2.blocks.10.mlp.fc1.weight", "layers.2.blocks.10.mlp.fc1.bias", "layers.2.blocks.10.mlp.fc2.weight", "layers.2.blocks.10.mlp.fc2.bias", "layers.2.blocks.11.norm.weight", "layers.2.blocks.11.norm.bias", "layers.2.blocks.11.op.x_proj_weight", "layers.2.blocks.11.op.dt_projs_weight", "layers.2.blocks.11.op.dt_projs_bias", "layers.2.blocks.11.op.A_logs", "layers.2.blocks.11.op.Ds", "layers.2.blocks.11.op.out_norm.weight", "layers.2.blocks.11.op.out_norm.bias", "layers.2.blocks.11.op.in_proj.weight", "layers.2.blocks.11.op.conv2d.weight", "layers.2.blocks.11.op.out_proj.weight", "layers.2.blocks.11.norm2.weight", "layers.2.blocks.11.norm2.bias", "layers.2.blocks.11.mlp.fc1.weight", "layers.2.blocks.11.mlp.fc1.bias", "layers.2.blocks.11.mlp.fc2.weight", "layers.2.blocks.11.mlp.fc2.bias", "layers.2.blocks.12.norm.weight", "layers.2.blocks.12.norm.bias", "layers.2.blocks.12.op.x_proj_weight", "layers.2.blocks.12.op.dt_projs_weight", "layers.2.blocks.12.op.dt_projs_bias", "layers.2.blocks.12.op.A_logs", "layers.2.blocks.12.op.Ds", "layers.2.blocks.12.op.out_norm.weight", "layers.2.blocks.12.op.out_norm.bias", "layers.2.blocks.12.op.in_proj.weight", "layers.2.blocks.12.op.conv2d.weight", "layers.2.blocks.12.op.out_proj.weight", "layers.2.blocks.12.norm2.weight", "layers.2.blocks.12.norm2.bias", "layers.2.blocks.12.mlp.fc1.weight", "layers.2.blocks.12.mlp.fc1.bias", "layers.2.blocks.12.mlp.fc2.weight", "layers.2.blocks.12.mlp.fc2.bias", "layers.2.blocks.13.norm.weight", "layers.2.blocks.13.norm.bias", "layers.2.blocks.13.op.x_proj_weight", "layers.2.blocks.13.op.dt_projs_weight", "layers.2.blocks.13.op.dt_projs_bias", "layers.2.blocks.13.op.A_logs", "layers.2.blocks.13.op.Ds", "layers.2.blocks.13.op.out_norm.weight", "layers.2.blocks.13.op.out_norm.bias", "layers.2.blocks.13.op.in_proj.weight", "layers.2.blocks.13.op.conv2d.weight", "layers.2.blocks.13.op.out_proj.weight", "layers.2.blocks.13.norm2.weight", "layers.2.blocks.13.norm2.bias", "layers.2.blocks.13.mlp.fc1.weight", "layers.2.blocks.13.mlp.fc1.bias", "layers.2.blocks.13.mlp.fc2.weight", "layers.2.blocks.13.mlp.fc2.bias", "layers.2.blocks.14.norm.weight", "layers.2.blocks.14.norm.bias", "layers.2.blocks.14.op.x_proj_weight", "layers.2.blocks.14.op.dt_projs_weight", "layers.2.blocks.14.op.dt_projs_bias", "layers.2.blocks.14.op.A_logs", "layers.2.blocks.14.op.Ds", "layers.2.blocks.14.op.out_norm.weight", "layers.2.blocks.14.op.out_norm.bias", "layers.2.blocks.14.op.in_proj.weight", "layers.2.blocks.14.op.conv2d.weight", "layers.2.blocks.14.op.out_proj.weight", "layers.2.blocks.14.norm2.weight", "layers.2.blocks.14.norm2.bias", "layers.2.blocks.14.mlp.fc1.weight", "layers.2.blocks.14.mlp.fc1.bias", "layers.2.blocks.14.mlp.fc2.weight", "layers.2.blocks.14.mlp.fc2.bias". size mismatch for patch_embed.0.weight: copying a param with shape torch.Size([64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([96, 3, 4, 4]). size mismatch for patch_embed.0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for patch_embed.2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for patch_embed.2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 10, 256]) from checkpoint, the shape in current model is torch.Size([4, 38, 192]). size mismatch for layers.0.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 256, 8]) from checkpoint, the shape in current model is torch.Size([4, 192, 6]). size mismatch for layers.0.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 256]) from checkpoint, the shape in current model is torch.Size([4, 192]). size mismatch for layers.0.blocks.0.op.A_logs: copying a param with shape torch.Size([1024, 1]) from checkpoint, the shape in current model is torch.Size([768, 16]). size mismatch for layers.0.blocks.0.op.Ds: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.0.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 1, 3, 3]). size mismatch for layers.0.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([96, 192]). size mismatch for layers.0.blocks.0.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.0.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]). size mismatch for layers.0.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 10, 256]) from checkpoint, the shape in current model is torch.Size([4, 38, 192]). size mismatch for layers.0.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 256, 8]) from checkpoint, the shape in current model is torch.Size([4, 192, 6]). size mismatch for layers.0.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 256]) from checkpoint, the shape in current model is torch.Size([4, 192]). size mismatch for layers.0.blocks.1.op.A_logs: copying a param with shape torch.Size([1024, 1]) from checkpoint, the shape in current model is torch.Size([768, 16]). size mismatch for layers.0.blocks.1.op.Ds: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.0.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 1, 3, 3]). size mismatch for layers.0.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([96, 192]). size mismatch for layers.0.blocks.1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]). size mismatch for layers.0.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.0.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]). size mismatch for layers.0.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for layers.0.downsample.1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 96, 2, 2]). size mismatch for layers.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.downsample.3.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.0.downsample.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 18, 512]) from checkpoint, the shape in current model is torch.Size([4, 44, 384]). size mismatch for layers.1.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 512, 16]) from checkpoint, the shape in current model is torch.Size([4, 384, 12]). size mismatch for layers.1.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([4, 384]). size mismatch for layers.1.blocks.0.op.A_logs: copying a param with shape torch.Size([2048, 1]) from checkpoint, the shape in current model is torch.Size([1536, 16]). size mismatch for layers.1.blocks.0.op.Ds: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.1.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([512, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([512, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 1, 3, 3]). size mismatch for layers.1.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([192, 384]). size mismatch for layers.1.blocks.0.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.1.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]). size mismatch for layers.1.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 18, 512]) from checkpoint, the shape in current model is torch.Size([4, 44, 384]). size mismatch for layers.1.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 512, 16]) from checkpoint, the shape in current model is torch.Size([4, 384, 12]). size mismatch for layers.1.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([4, 384]). size mismatch for layers.1.blocks.1.op.A_logs: copying a param with shape torch.Size([2048, 1]) from checkpoint, the shape in current model is torch.Size([1536, 16]). size mismatch for layers.1.blocks.1.op.Ds: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.1.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([512, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([512, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 1, 3, 3]). size mismatch for layers.1.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([192, 384]). size mismatch for layers.1.blocks.1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]). size mismatch for layers.1.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.1.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]). size mismatch for layers.1.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for layers.1.downsample.1.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 192, 2, 2]). size mismatch for layers.1.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.downsample.3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.1.downsample.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.0.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.0.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.0.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.1.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.1.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.1.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.2.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.2.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.2.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.2.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.2.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.2.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.2.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.2.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.2.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.2.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.2.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.2.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.2.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.2.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.3.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.3.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.3.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.3.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.3.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.3.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.3.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.3.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.3.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.3.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.3.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.3.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.3.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.3.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.4.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.4.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.4.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.4.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.4.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.4.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.4.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.4.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.4.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.4.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.4.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.4.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.4.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.4.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.5.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.5.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.5.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.5.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.5.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.5.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.5.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.5.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.5.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.5.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.5.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.5.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.5.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.6.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.6.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.6.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.6.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.6.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.6.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.6.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.6.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.6.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.6.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.6.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.6.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.6.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.6.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.7.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.7.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.7.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.7.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.7.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.7.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.7.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.7.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.7.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.7.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.7.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.7.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.7.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.7.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.op.x_proj_weight: copying a param with shape torch.Size([4, 34, 1024]) from checkpoint, the shape in current model is torch.Size([4, 56, 768]). size mismatch for layers.2.blocks.8.op.dt_projs_weight: copying a param with shape torch.Size([4, 1024, 32]) from checkpoint, the shape in current model is torch.Size([4, 768, 24]). size mismatch for layers.2.blocks.8.op.dt_projs_bias: copying a param with shape torch.Size([4, 1024]) from checkpoint, the shape in current model is torch.Size([4, 768]). size mismatch for layers.2.blocks.8.op.A_logs: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([3072, 16]). size mismatch for layers.2.blocks.8.op.Ds: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.2.blocks.8.op.out_norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.8.op.out_norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.blocks.8.op.in_proj.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.8.op.conv2d.weight: copying a param with shape torch.Size([1024, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 1, 3, 3]). size mismatch for layers.2.blocks.8.op.out_proj.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for layers.2.blocks.8.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.blocks.8.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]). size mismatch for layers.2.blocks.8.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.2.blocks.8.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]). size mismatch for layers.2.blocks.8.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for layers.2.downsample.1.weight: copying a param with shape torch.Size([1024, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 384, 2, 2]). size mismatch for layers.2.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.downsample.3.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.2.downsample.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.op.x_proj_weight: copying a param with shape torch.Size([4, 66, 2048]) from checkpoint, the shape in current model is torch.Size([4, 80, 1536]). size mismatch for layers.3.blocks.0.op.dt_projs_weight: copying a param with shape torch.Size([4, 2048, 64]) from checkpoint, the shape in current model is torch.Size([4, 1536, 48]). size mismatch for layers.3.blocks.0.op.dt_projs_bias: copying a param with shape torch.Size([4, 2048]) from checkpoint, the shape in current model is torch.Size([4, 1536]). size mismatch for layers.3.blocks.0.op.A_logs: copying a param with shape torch.Size([8192, 1]) from checkpoint, the shape in current model is torch.Size([6144, 16]). size mismatch for layers.3.blocks.0.op.Ds: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([6144]). size mismatch for layers.3.blocks.0.op.out_norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.0.op.out_norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.0.op.in_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.0.op.conv2d.weight: copying a param with shape torch.Size([2048, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([1536, 1, 3, 3]). size mismatch for layers.3.blocks.0.op.out_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([768, 1536]). size mismatch for layers.3.blocks.0.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.3.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for layers.3.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.op.x_proj_weight: copying a param with shape torch.Size([4, 66, 2048]) from checkpoint, the shape in current model is torch.Size([4, 80, 1536]). size mismatch for layers.3.blocks.1.op.dt_projs_weight: copying a param with shape torch.Size([4, 2048, 64]) from checkpoint, the shape in current model is torch.Size([4, 1536, 48]). size mismatch for layers.3.blocks.1.op.dt_projs_bias: copying a param with shape torch.Size([4, 2048]) from checkpoint, the shape in current model is torch.Size([4, 1536]). size mismatch for layers.3.blocks.1.op.A_logs: copying a param with shape torch.Size([8192, 1]) from checkpoint, the shape in current model is torch.Size([6144, 16]). size mismatch for layers.3.blocks.1.op.Ds: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([6144]). size mismatch for layers.3.blocks.1.op.out_norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.1.op.out_norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for layers.3.blocks.1.op.in_proj.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.1.op.conv2d.weight: copying a param with shape torch.Size([2048, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([1536, 1, 3, 3]). size mismatch for layers.3.blocks.1.op.out_proj.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([768, 1536]). size mismatch for layers.3.blocks.1.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for layers.3.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for layers.3.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for layers.3.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for layers.3.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for classifier.head.weight: copying a param with shape torch.Size([1000, 1024]) from checkpoint, the shape in current model is torch.Size([1000, 768]).

MzeroMiko / VMamba

Demo Script for Image classification #46

Instantiate the model class defined in vmamba.py

Example usage:

Error :

89.0 + 15.2 + 118min/e + 48G