[Discussion] How to improve model's understanding of high-resolution images？

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

https://arxiv.org/abs/2401.15947

Apache License 2.0

1.9k stars 121 forks source link

[Discussion] How to improve model's understanding of high-resolution images？ #46

Open whalefa1I opened 6 months ago

whalefa1I commented 6 months ago

Discussion

At present, I have some tasks that need to parse images with high resolution and different aspect ratios, and llava's processing method is relatively simple at present. I've seen how other projects（nvait、vary） are handled, so how are you currently improving your model's ability to understand high-resolution images

LinB203 commented 6 months ago

Sorry, we're working on this, so please forgive me if I can't discuss more details. You can refer to LLaVA-1.6 for that.