danfenghong / IEEE_TPAMI_SpectralGPT

Hong, D., Zhang, B., Li, X., Li, Y., Li, C., Yao, J., Yokoya, N., Li, H., Ghamisi, P., Jia, X., Plaza, A. and Gamba, P., Benediktsson, J., Chanussot, J. (2024). SpectralGPT: Spectral remote sensing foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. DOI:10.1109/TPAMI.2024.3362475.
161 stars 17 forks source link

Inquiry about Model Layer Mismatch Issue #17

Open AllFever opened 3 months ago

AllFever commented 3 months ago

I'm using the SpectralGPT+ pre-trained model with the provided mae_vit_base_patch8_128 network architecture. While attempting to load pre-trained weights, I've encountered a mismatch between the layer names of SpectralGPT+ and mae_vit_base_patch8_128.  For instance, in the SpectralGPT+ model, the layer names are: blocks.0.attn.q.weight blocks.0.attn.q.bias blocks.0.attn.k.weight blocks.0.attn.k.bias blocks.0.attn.v.weight blocks.0.attn.v.bias

However, in the mae_vit_base_patch8_128 architecture, the corresponding layers are named: blocks.0.attn.to_qkv.weight blocks.0.attn.to_qkv.bias

Could you please advise on the correct way to map the pre-trained weights of SpectralGPT+ to the mae_vit_base_patch8_128 model? Any suggestions or insights would be greatly appreciated.  Thank you for your assistance!

moonboy12138 commented 3 months ago

Thank you for the heads-up! I see there are two attention blocks in this file: class Attention_original from line 78 to 130 and Attention from line 132 to 169. Deleting lines 132 to 169 and renaming class Attention_original to class Attention should resolve this issue. We appreciate your attention to detail, and we'll make the necessary modifications accordingly. Thanks again for the reminder!

AllFever commented 3 months ago

Thank you for your guidance and assistance. I have made the necessary modifications as per your suggestions, and the program is now running smoothly. Once again, I appreciate your work and contribution.