Open AllFever opened 6 months ago
Thank you for the heads-up! I see there are two attention blocks in this file: class Attention_original
from line 78 to 130 and Attention
from line 132 to 169. Deleting lines 132 to 169 and renaming class Attention_original
to class Attention
should resolve this issue. We appreciate your attention to detail, and we'll make the necessary modifications accordingly. Thanks again for the reminder!
Thank you for your guidance and assistance. I have made the necessary modifications as per your suggestions, and the program is now running smoothly. Once again, I appreciate your work and contribution.
I'm using the SpectralGPT+ pre-trained model with the provided mae_vit_base_patch8_128 network architecture. While attempting to load pre-trained weights, I've encountered a mismatch between the layer names of SpectralGPT+ and mae_vit_base_patch8_128. For instance, in the SpectralGPT+ model, the layer names are: blocks.0.attn.q.weight blocks.0.attn.q.bias blocks.0.attn.k.weight blocks.0.attn.k.bias blocks.0.attn.v.weight blocks.0.attn.v.bias
However, in the mae_vit_base_patch8_128 architecture, the corresponding layers are named: blocks.0.attn.to_qkv.weight blocks.0.attn.to_qkv.bias
Could you please advise on the correct way to map the pre-trained weights of SpectralGPT+ to the mae_vit_base_patch8_128 model? Any suggestions or insights would be greatly appreciated. Thank you for your assistance!