VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
973
stars
68
forks
source link
Update perception test eval script and results in README #81
Closed
Xiuyu-Li closed 5 days ago