NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
1.73k stars 134 forks source link

About the release of VILA v1.5 technical report or blog #90

Open Fr0zenCrane opened 1 month ago

Fr0zenCrane commented 1 month ago

Hi,

Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful and user-friendly MLLM available today.

Many of us who are interested in the VILA v1.5 series are curious about the development of its video intelligence capabilities. A detailed technical report on this aspect would be incredibly beneficial. Could you let us know if the VILA development team is planning to release such a report?

Thank you!

Lyken17 commented 1 month ago

Thanks for your interest! Sure, we will release soon by end of July. Please stay tuned