Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

LLM version #30

Closed gordonhu608 closed 2 months ago

gordonhu608 commented 2 months ago

It seems like the paper reported scores using LLaMa-2. Whereas in the released training code, we are guided to use vicuna-1.5 which is the same as LLaVA. Can we assume that vicuna-1.5 training can work smoothly use the current code?