AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
https://huggingface.co/AIDC-AI/Ovis1.5-Llama3-8B
Apache License 2.0
105 stars 3 forks source link

The training effect is poor using official data and code #15

Open liuheng0111 opened 1 week ago

liuheng0111 commented 1 week ago

Hi, I using the dataset the same as Ovis-data, training with gemma2-9b, the loss drop but i find the model can't follow instructions. Is it because the vision encoder was unfreeze in the S3 stage, caused the visual module to crash?

What is your training configuration like? How many cards are used? What is the batch_size of each card? Are the optimizer parameters the same as the script settings?

runninglsy commented 1 week ago

The training configurations for each stage can be found at: https://github.com/AIDC-AI/Ovis/tree/main/scripts/v1_5.

The parameters in the scripts are configured for an 8-GPU setting. During actual training, we utilize an internal distributed system that typically employs 64 or more GPUs. Nevertheless, the overall batch size remains consistent with that defined in the scripts. The code, data, and hyperparameters we used for training are consistent with the open-source version.