[Question] Image patch representation in this work

Question

Hello. Firstly I will thank your assistance in debugging Qwen1.5 problem. I have achieved remarkable performance on Qwen1.5. I am now working on intergrating your codebase with LLaVA-Next (Aiming to intergrate the high-resolution support). I am now came up with a question about image patch representation of your code.

As is shown in Official LLaVA repo, the image feature map are flatten explicitly. But in your implementation, I did not find any operation to flatten image features. I am curious about the organization of image features in your work.

PKU-YuanGroup / MoE-LLaVA

[Question] Image patch representation in this work #43

Question