IBM / mi-prometheus

Enabling reproducible Machine Learning research
http://mi-prometheus.rtfd.io/
Apache License 2.0
42 stars 18 forks source link

Update status in the checkpoint when training hit the last epoch/episode #85

Closed tkornuta-ibm closed 5 years ago

tkornuta-ibm commented 5 years ago

In the case when:

Then we should:

This requires to additionally save training status along with loss in model.

vmarois commented 5 years ago

Yes :+1: A case where this is useful is when the best_model (based on the validation loss criterion) is saved early during the training but it never went under the loss threshold, thus did not actually "converged". In this case, the information model_saved_timestamp and terminal_status_update_timestamp will help the user understand this case

tkornuta-ibm commented 5 years ago

List of possible statuses (for now):

The last one means: