Hello.
Firstly I will thank your assistance in debugging Qwen1.5 problem. I have achieved remarkable performance on Qwen1.5.
I am now working on intergrating your codebase with LLaVA-Next (Aiming to intergrate the high-resolution support). I am now came up with a question about image patch representation of your code.
As is shown in Official LLaVA repo, the image feature map are flatten explicitly. But in your implementation, I did not find any operation to flatten image features. I am curious about the organization of image features in your work.
Question
Hello. Firstly I will thank your assistance in debugging Qwen1.5 problem. I have achieved remarkable performance on Qwen1.5. I am now working on intergrating your codebase with LLaVA-Next (Aiming to intergrate the high-resolution support). I am now came up with a question about image patch representation of your code.
As is shown in Official LLaVA repo, the image feature map are flatten explicitly. But in your implementation, I did not find any operation to flatten image features. I am curious about the organization of image features in your work.