Pretrained checkpoint parameters loaded failed

NguyenVH01 commented 3 months ago

Thank you for your amazing work on the Mamba2 model. I am currently trying to load a pretrained model on VMamba Tiny-224 for image classification, but I encountered the following error:

File "vmamba.py", line 48, in _load_from_state_dict state_dict[prefix + "weight"] = state_dict[prefix + "weight"].view(self.weight.shape) RuntimeError: shape '[192, 96]' is invalid for input of size 9216

It seems that the shape of the weight tensor does not match the expected input size. Could you please provide guidance on how to resolve this issue? Is there a specific step I might be missing or a modification needed in the model architecture?

Thank you.

MzeroMiko commented 3 months ago

I did not encounter this problem, can you show me the traceback (in which layer of which stage this problem encounters)?

NguyenVH01 commented 2 months ago

Actually when i run directly like your example it's working normal:

but when i run for training on my dataset, it's have trouble on load_pretrained_ema like this image:

I also tried to performance same config on checkpoint it run error same with my trouble

Can you help me to explain for my error. Thank you.

MzeroMiko / VMamba

Pretrained checkpoint parameters loaded failed #286