SFT Batch size - Githubissues

Great job! I found this is a very good reference for anyone who training their own LLM model, especially GPU poor guys.

I got a question about the hyperparameters of your works, I see the batch size is 2 with 4 gradient_accumulation_steps, this is relatively small for 4090 24GB VRAM, I think it could training with large batch size, or is it due to the max sequence size is 8196 would lead to OOM?

Upaya07 / NeurIPS-llm-efficiency-challenge

SFT Batch size #2