DeepMLNet / DeepNet

Deep.Net machine learning framework for F#
Apache License 2.0
102 stars 9 forks source link

Checkpointing interrupted to early #29

Closed jklanger closed 7 years ago

jklanger commented 7 years ago

The given 30 sec from the HPC tool before killing the job are not enough to finish saving the checkpoint, leaving a corrupted checkpoint.

Saving checkpoint to ...