Open HenryYihengXu opened 4 years ago
Since embedding will be stored in a pickle file, I suppose if I set "resume" to true, it will train next num_epoch epochs based on existing embedding in file pickle right? But it didn't work in my experiment.
I was plotting the curve of MRR regarding num_epoch on fb15k using DistMult model. I set num_epoch to 10 in distmult_fb15k.yaml. And I ran it by "graphvite run distmult_fb15k.yaml". Then I kept num_epoch 10 but set resume to true and ran again. I expected MRR to grow, but it didn't. In fact, it even decreased to a very small number. Is there anything wrong with my usage of "resume"?
The current implementation doesn't record num_epoch
in the pickle file. resume=True
only prevents reinitialization of embeddings.
By default, GraphVite uses a linear decay learning rate scheduler. This is probably the reason why resume doesn't work in your case, since the learning rate goes from lr
to infinite small
, and then restarts from lr
to infinite small
. You may override this behavior by setting scheduler="constant"
.
However, both theoretical and empirical results show that using a learning rate decay is always better than a constant one. So maybe the best practice is to maintain the learning rate decay by yourself when you resume training.
Could you explain how to use "resume" in config files? I want to reproduce the curve of F1-scores regarding to the number of epochs. I don't want to make experiments for each num_epoch. So is it possible to set num_epoch to 2000, but stop and evaluate after every 10 epoch, and then resume?