Inquiry about Model Layer Mismatch Issue

AllFever commented 3 months ago

I'm using the SpectralGPT+ pre-trained model with the provided mae_vit_base_patch8_128 network architecture. While attempting to load pre-trained weights, I've encountered a mismatch between the layer names of SpectralGPT+ and mae_vit_base_patch8_128. For instance, in the SpectralGPT+ model, the layer names are: blocks.0.attn.q.weight blocks.0.attn.q.bias blocks.0.attn.k.weight blocks.0.attn.k.bias blocks.0.attn.v.weight blocks.0.attn.v.bias

However, in the mae_vit_base_patch8_128 architecture, the corresponding layers are named: blocks.0.attn.to_qkv.weight blocks.0.attn.to_qkv.bias

Could you please advise on the correct way to map the pre-trained weights of SpectralGPT+ to the mae_vit_base_patch8_128 model? Any suggestions or insights would be greatly appreciated. Thank you for your assistance!

moonboy12138 commented 3 months ago

Thank you for the heads-up! I see there are two attention blocks in this file: class Attention_original from line 78 to 130 and Attention from line 132 to 169. Deleting lines 132 to 169 and renaming class Attention_original to class Attention should resolve this issue. We appreciate your attention to detail, and we'll make the necessary modifications accordingly. Thanks again for the reminder!

AllFever commented 3 months ago

Thank you for your guidance and assistance. I have made the necessary modifications as per your suggestions, and the program is now running smoothly. Once again, I appreciate your work and contribution.

danfenghong / IEEE_TPAMI_SpectralGPT

Inquiry about Model Layer Mismatch Issue #17