NVIDIA / modulus-launch

Repo of optimized training recipes for accelerating PyTorch workflows of AI driven surrogates for physical systems
Apache License 2.0
56 stars 27 forks source link

🚀[FEA]: Improve the checkpointing utility #100

Open mnabian opened 1 year ago

mnabian commented 1 year ago

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem you would like to solve.

Currently the checkpointing utility saves and keeps the checkpoints for all the epochs. We need to keep only a few checkpoints (last few checkpoints and/or best checkpoints based on a validation metric).

Describe any alternatives you have considered

No response