Closed nicolas-dufour closed 4 months ago
@nicolas-dufour Thank you for your clarification. We didn't use LORA to train the LLM because the model size was small. Instead, we directly fine-tuned the model on the full dataset.
@hhaAndroid thanks for the clarification!
I think using a Lora can still improve performance even if it fits in memory!
The recent Idefics2 paper show they achieve better perf with a LoRA than with full fine-tuning!
@nicolas-dufour I'm curious, would LoRa really perform better if we have more training data? Maybe we can try it later.
@nicolas-dufour I'm curious, would LoRa really perform better if we have more training data? Maybe we can try it later.
According to this paper they argue that Lora stabilize training facilitating optimization. They do use quite a lot of data on their side as well.
Also Lora has the side benefit of facilitating the inference of both llama3 and llava on the same device as we only need to disable the Lora to get the llama3!
Thanks for looking into this!
Hi, I see that the Llava model is trained with full finetuning of the LLM. Did you do the ablation using LoRAs? If so do you have a version with LoRA instead of full finetuning?
Thanks!