This PR adds LLaVA1.5 yaml, user can run the following command to start a LLaVA training job:
python run.py --config-path ./examples/llava/conf --config-name config
The learning rate related settings and model parameters are consistent with the paper, the paper link is: https://arxiv.org/pdf/2310.03744.
This PR adds LLaVA1.5 yaml, user can run the following command to start a LLaVA training job:
python run.py --config-path ./examples/llava/conf --config-name config
The learning rate related settings and model parameters are consistent with the paper, the paper link is: https://arxiv.org/pdf/2310.03744.
The other training hyper parameters are consistent with Megatron, due to DeepSpeed zero stage2 used in the paper but Megatron does not support it. The Megatron tutorial is https://github.com/FlagOpen/FlagScale/tree/main/megatron/examples/multimodal