NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
973 stars 68 forks source link

Update perception test eval script and results in README #81

Closed Xiuyu-Li closed 5 days ago