Open se7esx opened 8 months ago
One possible solution is to set the train.save_model
to be true
in the regressor_model.yaml
One possible solution is to set the
train.save_model
to betrue
in the regressor_model.yaml
thanks, but i follow your setup but still can't store the checkpoint ...
Hi! I have faced with exactly the same problem... Trying to solve it now, will report if something works out. For now it seems that some options are duplicated in different config files (e.g. config for model is also written in experiments/<config_name>.yaml
). The same goes for training options too. Configs structure seems overcomplicated for me, maybe this is the reason why our problem occurred.
UPD: I analyzed the logs and in my case model checkpoints (as well as other run data) were saved in the parental directory, which is determined in config.yaml
. Before that, I ran main.py
from src
directory and thus model outputs were saved in root dir, and it was my wrong interpretation that I should observe next outputs in the root too.
Also it turned out that I didn't switch any save_model
parameters, so it worked out of the box. @se7esx do you have any ooutput which only lacks checkpoints or your run produces nothing?
UPD: I analyzed the logs and in my case model checkpoints (as well as other run data) were saved in the parental directory, which is determined in
config.yaml
. Before that, I ranmain.py
fromsrc
directory and thus model outputs were saved in root dir, and it was my wrong interpretation that I should observe next outputs in the root too.Also it turned out that I didn't switch any
save_model
parameters, so it worked out of the box. @se7esx do you have any ooutput which only lacks checkpoints or your run produces nothing?
I checked the default save path, it was created in a folder called output
but it couldn't be saved no matter how much I set the save parameter.
When I run train_qm9_regressor.py, I can't save the model, the model parameters are not saved to checkpoint_callback.dirpath, I used offline training