VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
878
stars
55
forks
source link
added functionality to process a bunch of videos at a time #75
Closed
poorfrombabylon closed 1 week ago