csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Other
552 stars 27 forks source link

Images and videos with high resolution #6

Closed codonna9 closed 9 months ago

codonna9 commented 9 months ago

Thank you for releasing the model & code. Can the model work with images and videos of high resolution like 720x1280, without having to resize them to 224x224?

csuhan commented 9 months ago

Hi @codonna9 , Currently we need to resize the image/video to 224. For higher resolution, you can try our SPHINX model which supports 448 inputs.

codonna9 commented 9 months ago

Thanks a lot for your reply. I tried Sphinx before but 448 size is still a bit small for high resolution images/videos

csuhan commented 9 months ago

Yeah. It's a trade off between resolution and computation.