NJU-LHRS / LHRS-Bot

VGI-Enhanced multimodal large language model for remote sensing images.
Apache License 2.0
94 stars 8 forks source link

Hidden states at different layers. #12

Closed JennieZ98 closed 4 months ago

JennieZ98 commented 5 months ago

Thanks for your great work! I‘m interested in the codes about 'Vision Perceiver'. The paper shows that,"We extract hidden states at layers {i = L/3, j = 2L/3, k = L − 1} for summarizing through vision perceiver, where L denotes the number of layers in the vision encoder." I read the code and only found that the 'AttnPooler' get the final output of 'Vision Encoder' as input, and split it into 3 part averagely. I wonder if I miss some details?

pUmpKin-Co commented 5 months ago

Hi~ We extract different hidden states from vision encoder and concatate before feeding to Perceiver. Plz see here for detailed implementation.

Thanks.

JennieZ98 commented 4 months ago

Thanks for your reply. It is helpful!