TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
MIT License
425 stars 194 forks source link

Any plan to support "Dynamic High Resolution" proposed in LLaVA v.16? #11

Closed yushuinanrong closed 5 months ago

yushuinanrong commented 6 months ago

First of all, great work!

Wondering if you have the plan to implement "Dynamic High Resolution" or "Anyres" (https://llava-vl.github.io/blog/2024-01-30-llava-next/) that would assist higher resolution images?

Best, Mo

siddk commented 5 months ago

This isn't on our immediate roadmap but I think it'd be an awesome addition to the codebase! If you'd like (and have the time), please feel free to PR!

Happy to provide high-level pointers if there are any questions for implementing this as well!