NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
1.87k stars 149 forks source link

Dataset and Training code for Longvila #132

Open JcWang20 opened 1 month ago

JcWang20 commented 1 month ago

Longvila is a wonderful work. and i wanna how can i get the dataset for stage4,5 in your paper. There also seems to be no mention of stage4, 5 training scripts in the script directory. In other words, how can i reproduce longvila based on an existing vila. Looking forward to your reply

hello-bluedog commented 1 month ago

looking forward, too

nate-walter commented 3 weeks ago

Same here