"Resume" in config files

DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System

https://graphvite.io

Apache License 2.0

1.22k stars 151 forks source link

"Resume" in config files #43

Open HenryYihengXu opened 4 years ago

HenryYihengXu commented 4 years ago

Could you explain how to use "resume" in config files? I want to reproduce the curve of F1-scores regarding to the number of epochs. I don't want to make experiments for each num_epoch. So is it possible to set num_epoch to 2000, but stop and evaluate after every 10 epoch, and then resume?

HenryYihengXu commented 4 years ago

Since embedding will be stored in a pickle file, I suppose if I set "resume" to true, it will train next num_epoch epochs based on existing embedding in file pickle right? But it didn't work in my experiment.

I was plotting the curve of MRR regarding num_epoch on fb15k using DistMult model. I set num_epoch to 10 in distmult_fb15k.yaml. And I ran it by "graphvite run distmult_fb15k.yaml". Then I kept num_epoch 10 but set resume to true and ran again. I expected MRR to grow, but it didn't. In fact, it even decreased to a very small number. Is there anything wrong with my usage of "resume"?

KiddoZhu commented 4 years ago

The current implementation doesn't record num_epoch in the pickle file. resume=True only prevents reinitialization of embeddings.

By default, GraphVite uses a linear decay learning rate scheduler. This is probably the reason why resume doesn't work in your case, since the learning rate goes from lr to infinite small, and then restarts from lr to infinite small. You may override this behavior by setting scheduler="constant".

However, both theoretical and empirical results show that using a learning rate decay is always better than a constant one. So maybe the best practice is to maintain the learning rate decay by yourself when you resume training.