Some questions for training (Parameters, Batchsize...)

baaivision / EVE

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

MIT License

238 stars 3 forks source link

Some questions for training (Parameters, Batchsize...) #10

Closed Maokui-He closed 3 months ago

Maokui-He commented 4 months ago

Remarkable work you have done! There are some questions for training that you may not detail in the paper. Were the LLM parameters fully updated in stage 2 (Generative Pre-training)? I'm curious about the batchsize can be set to 512 on 2*8 GPUs with 40GB memory. Was the length of the training data general short?

Paranioar commented 3 months ago

Remarkable work you have done! There are some questions for training that you may not detail in the paper. Were the LLM parameters fully updated in stage 2 (Generative Pre-training)? I'm curious about the batchsize can be set to 512 on 2*8 GPUs with 40GB memory. Was the length of the training data general short?

Yes, train all LLM weights in stage 2 with zero-3. Due to limited devices, we add gradient_accumulation_steps to ensure a total batch size of 512.