Today you have a few approaches to save the model and continue work with that
Save the model as TF graph and weights in txt format (example). The model could be used for inference only after loading as InferenceModel.
Save the model as JSON config and weights in txt format (example). The model could be fine-tuned, training could be continued. This approach works well in both cases: you trained your model in Keras or KotlinDL firstly. The checkpointing could be organized via a sequence of model saving in this format.
Agree, that we need probably the special checkpointing API.
I want to try out different training mechanisms using save-restore-replay. Could you tell me if it's possible to make API to support such use-cases?