ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add Option for Memory Optimized Training via Gradient Checkpointing #178

Closed klei22 closed 2 months ago

klei22 commented 2 months ago

Description:

This PR introduces gradient checkpointing to reduce memory usage during model training.

Changes:

Benefits:

Trade-offs:


Checklist:

klei22 commented 2 months ago

Wanted to note for maximum memory savings, do not add the --compile flag.

For a context length of 1024, (gc will stand for 'gradient checkpointing')