Please check that this issue hasn't been reported before.
[X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
When I run a training run and I do not specify the number of checkpoints to save in save_total_limit, I expected the Axolotl code to save all checkpoints.
Please check that this issue hasn't been reported before.
Expected Behavior
When I run a training run and I do not specify the number of checkpoints to save in
save_total_limit
, I expected the Axolotl code to save all checkpoints.Current behaviour
However, according to this code:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/132eb740f036eff0fa8b239ddaf0b7a359ed1732/src/axolotl/core/trainer_builder.py#L1168C22-L1168C38
the number of checkpoints defaults to 4. This seems arbitrary to me.
Steps to reproduce
Run the training code without setting
save_total_limit
explicitly.Config yaml
No response
Possible solution
This (in my opinion) is not well documented, so I'd like either some more documentation, or have the default to save all checkpoints.
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main
Acknowledgements