Great job! I found this is a very good reference for anyone who training their own LLM model, especially GPU poor guys.
I got a question about the hyperparameters of your works, I see the batch size is 2 with 4 gradient_accumulation_steps, this is relatively small for 4090 24GB VRAM, I think it could training with large batch size, or is it due to the max sequence size is 8196 would lead to OOM?
Great job! I found this is a very good reference for anyone who training their own LLM model, especially GPU poor guys.
I got a question about the hyperparameters of your works, I see the batch size is 2 with 4 gradient_accumulation_steps, this is relatively small for 4090 24GB VRAM, I think it could training with large batch size, or is it due to the max sequence size is 8196 would lead to OOM?