Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
877 stars 55 forks source link

VILA-1.5 details #67

Closed Lopa07 closed 1 month ago

Lopa07 commented 1 month ago
hkunzhe commented 1 month ago

You can look at the model configuration files on Hugging Face or the training code in the repository.

Lopa07 commented 1 month ago

Sorry, I can not find these details. It will be very helpful, if you please post these information here for better visibility.

yaolug commented 1 month ago

You can look at the training scripts under https://github.com/Efficient-Large-Model/VILA/tree/main/scripts/v1_5/release You can refer to the technical details from the original paper. https://arxiv.org/pdf/2312.07533 We made Section 4.4 the default now.

Lopa07 commented 1 month ago

Thank you both! This helped.