Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful and user-friendly MLLM available today.
Many of us who are interested in the VILA v1.5 series are curious about the development of its video intelligence capabilities. A detailed technical report on this aspect would be incredibly beneficial. Could you let us know if the VILA development team is planning to release such a report?
Hi,
Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful and user-friendly MLLM available today.
Many of us who are interested in the VILA v1.5 series are curious about the development of its video intelligence capabilities. A detailed technical report on this aspect would be incredibly beneficial. Could you let us know if the VILA development team is planning to release such a report?
Thank you!