VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
972
stars
68
forks
source link
added functionality to process a bunch of videos at a time #75
Closed
poorfrombabylon closed 2 weeks ago