Closed danbraunai closed 2 months ago
Should happen using a log-based scheduler.
Can use a similar structure as is done in Pythia:
To promote research on the learning dynamics of LLMs we make 154 checkpoints available for each model, representing steps 0 (initialization), 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1000, and then every 1,000 subsequent steps.
Should happen using a log-based scheduler.
Can use a similar structure as is done in Pythia: