Closed FrancescoTerrosi closed 4 years ago
Hello,
We currently don't use h5 to save models, you can use the --checkpoint_save_secs (-s) flag to tell Coach to save your model every x number of seconds. This will use the checkpoint saving mechanism relevant to the framework you are using (tensorflow/mxnet). The checkpoints will be saved in the directory called checkpoints in the experiment folder.
In order to test the model without training, you need to run coach with --checkpoint_restore_dir (-crd) with a path to the checkpoints directory that you saved previously, and --evaluate so that Coach only evaluates the model and doesn't perform any training steps.
Let me know if that works for you!
Shadi
Thank you for your answer, to begin with.
It's nice to know that there is a saving/loading mechanism already implemented. Unfortunately I need a mechanism more episode-based rather than seconds-based. I might just go straight into the codebase and implement it, it should not create any weird behaviour I think, right? (unless you are doing something very exotic in it :) )
Cheers, Francesco
If you are running the training loop manually using the GraphManager object (as shown in Tutorial 0) you could do what you need by defining the checkout_save_dir field in the TaskParameters and then call graph_manager.save_checkpoint() whenever you want to save a checkpoint.
Shadi
Nice and simple, thank you very much
Hullo,
I've been playing for a couple of days with this framework and I don't get how to save the model (.h5) and then test it without training. Do I have to code something or i am missing something? :)
Cheers