Closed tianyu-z closed 3 years ago
@vict0rsch Yeah, I totally agree with your commit. In order to better load/resume an intermediate model, I added an option in defaults.yaml to load a model by the exact path to the pth model.
Besides, according to recent training exps that I have done, I changed the saving params to:
save_n_epochs: 2 # Save `latest_ckpt.pth` every epoch, `epoch_{epoch}_ckpt.pth` model every n epochs if epoch >= min_save_epoch
min_save_epoch: 28 # Save extra intermediate checkpoints when epoch > min_save_epoch
@tianyu-z I re-implemented the logic. I hope you can agree with me that it is more versatile and robust to errors (especially people using the wrong arguments, this WILL happen)
I created a doc comments in defaults.yaml:
README on load_path
1/ any path which leads to a dir will be loaded as `path / checkpoints / latest_ckpt.pth`
2/ if you want to specify a specific checkpoint, it MUST be a `.pth` file
3/ resuming a P OR an M model, you may only specify 1 of `load_path.p` OR `load_path.m`.
You may also leave BOTH at none, in which case `output_path / checkpoints / latest_ckpt.pth`
will be used
4/ resuming a P+M model, you may specify (`p` AND `m`) OR `pm` OR leave all at none,
in which case `output_path / checkpoints / latest_ckpt.pth` will be used to load from
a single checkpoint
@tianyu-z do you think the code now handles everything we want to cover? Can you see any loophole in the logic?
@tianyu-z do you think the code now handles everything we want to cover? Can you see any loophole in the logic?
Thanks a lot! I am checking now.
@vict0rsch I don't see any holes in your logic. It's pretty strong. :fire: Sorry, I was not aware that there are other things related to the self.output_path.
No worries but just look for self.output_path
next time to make sure it's safe to not have it point to a directory. And you'll see it's used all over the place
@tianyu-z I simplified the logic. If you agree with my changes let's merge :)
(I think that if we can limit parameters and keep a simple logic, we should do so)