aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

use nxd native checkpoint to support resume from latest #60

Closed wzamazon closed 7 months ago

wzamazon commented 7 months ago

New version of nxd's save_checkpoint and load_checkpoint now support loading the latest checkpoints. This patch added an option "latest_if_exists" to the "loading_step" command line argument, which make use of the new feature.

This allows user to use the same command to resume training