Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Finetuning #70

Closed RohanR04 closed 4 weeks ago

RohanR04 commented 1 month ago

Is it possible to finetune VILA through hugging face with a custom image dataset? I don't see any documentation about this.