MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.06k stars 123 forks source link

Pretrained checkpoint parameters loaded failed #286

Open NguyenVH01 opened 1 month ago

NguyenVH01 commented 1 month ago

Thank you for your amazing work on the Mamba2 model. I am currently trying to load a pretrained model on VMamba Tiny-224 for image classification, but I encountered the following error:

File "vmamba.py", line 48, in _load_from_state_dict state_dict[prefix + "weight"] = state_dict[prefix + "weight"].view(self.weight.shape) RuntimeError: shape '[192, 96]' is invalid for input of size 9216

It seems that the shape of the weight tensor does not match the expected input size. Could you please provide guidance on how to resolve this issue? Is there a specific step I might be missing or a modification needed in the model architecture?

Thank you.

MzeroMiko commented 1 month ago

I did not encounter this problem, can you show me the traceback (in which layer of which stage this problem encounters)?

image

NguyenVH01 commented 1 month ago

Actually when i run directly like your example it's working normal:

image

but when i run for training on my dataset, it's have trouble on load_pretrained_ema like this image:

Screenshot 2024-08-28 at 00 29 30

I also tried to performance same config on checkpoint it run error same with my trouble

image image

Can you help me to explain for my error. Thank you.