Upaya07 / NeurIPS-llm-efficiency-challenge

Code for NeurIPS LLM Efficiency Challenge
Apache License 2.0
54 stars 9 forks source link

SFT Batch size #2

Closed indiejoseph closed 8 months ago

indiejoseph commented 9 months ago

Great job! I found this is a very good reference for anyone who training their own LLM model, especially GPU poor guys.

I got a question about the hyperparameters of your works, I see the batch size is 2 with 4 gradient_accumulation_steps, this is relatively small for 4090 24GB VRAM, I think it could training with large batch size, or is it due to the max sequence size is 8196 would lead to OOM?