DeepGraphLearning / KnowledgeGraphEmbedding

MIT License
1.25k stars 265 forks source link

Script for finding the best hyperparameters #8

Closed dschaehi closed 5 years ago

dschaehi commented 5 years ago

Hi Zhiqing, thanks for making your code available for reproducibility. I am just wondering whether you could also share the script that you use for tuning the hyperparameters. This would make your approach even more reproducible. Thanks.

Edward-Sun commented 5 years ago

Hi, as described in the paper, the ranges of the hyperparameters for the grid search are set as follows: embedding dimension k ∈ {125, 250, 500, 1000}, batch size b ∈ {512, 1024, 2048}, self-adversarial sampling temperature α ∈ {0.5, 1.0}, and fixed margin γ ∈ {3, 6, 9, 12, 18, 24, 30}.

I used a for loop in bash script. The syntax is

for VAR1 in var1 var2 var3 do for VAR2 in var4 var5 var6 do bash something.sh done done

In practice, I train the model for 1/10 max_steps to search hyperparameters roughly, and then select 3 or 4 good candidates and try them on full max_steps.

I hope these information can help you reproduce our results:)

dschaehi commented 5 years ago

Thank you for your answer. But the ranges for the following parameters are not specified in the paper:

In best_config.sh it seems that you use different values for the three parameters. Can you say more about this? Perhaps it would be easier that you just upload to the repository the script you used for finding the best hyperparameters.

Edward-Sun commented 5 years ago

Sorry, I have left mila, so I cannot find my script for searching hyperparameters. @KiddoZhu is still working on the KGE project in our group. Maybe he will have some insights.

As for my personal experience, I have the following suggestions:

NEGATIVE_SAMPLE_SIZE is the larger the better (more accurate). So just set it to the extend of GPU memory. When every other hyperparameters are fixed, there will be a trade off among HIDDEN_DIM, NEGATIVE_SAMPLE_SIZE and BATCH_SIZE for GPU memory.

You can also grid search LEARNING_RATE as well.

As for MAX_STEPS, since I haven't observed any overfitting in our implementation of several popular KGE models. You can train as many steps as you wish. An easy way is to plot the loss/step curve and find when the loss doesn't continue to drop.

Thank you for your concerns in our work!

dschaehi commented 5 years ago

Thank you for the suggestions!

No worries, if you don't have the script anymore. It isn't actually a common practice to keep such a script, but in my opinion, it would be good to also make it to a practice. This way, the authors can prove that they didn't choose (a range of) hyperparameters that work best on the test set, but best on the validation set.

KiddoZhu commented 5 years ago

According to my results, the gap between the validation set and test set on all datasets are small enough compared to the gap of different methods. I am not sure about the original experiments in the paper, but the results shouldn't differ very much whichever dataset Zhiqing used.

If you want to adopt RotatE model to your own datasets, you need to search LEARNING_RATE and GAMMA. It would be better if you also tune NEGATIVE_SAMPLE_SIZE, but the default value is usually good. For MAX_STEPS, generally some value proportional to |E| will work.

dschaehi commented 5 years ago

Thanks for the tips @KiddoZhu!