Detailed parameter settings of ComplEx on WN18RR and FB15237?

boyang-nlp commented 6 years ago

Hi, I have failed to reproduce results of ComplEx on WN18RR and FB15K237 reported in ConvE paper. I use an implementation of myself and it can reproduce results of FB15K and WN18 correctly. Could you please tell me the optimal parameter settings of ComplEx you implemented on these two datasets?

TimDettmers commented 6 years ago

Note that in our paper, ComplEx refers to ComplEx with traditional 1-1 scoring. To produce the results for ComplEx with 1-1 scoring, we used the software from my co-author: https://github.com/uclmr/inferbeddings

The following shows an excerpt from an email that also inquires for parameter settings for ComplEx on FB15k-237:

thank you for your interest in our work. Pasquale (CC) work on this with his code which you can find here: https://github.com/uclmr/inferbeddings

Here are some example results, they are not the best that we got through. Batch size for Pasquale's code is the entire training set split into 10 pieces.

Here are also hyperparameters for ComplEx. These results are actually a bit better than we report in the paper, so we might have missed these results. We updated our paper accordingly.
$ ~/workspace/inferbeddings/tools/parse_results_mrr_filtered.sh *ComplEx*
192
Best MRR, Filt: ucl_fb15k-237_v1.embedding_size=200_epochs=1000_loss=pairwise_hinge_margin=5_model=ComplEx_similarity=dot.log
Test - Best Raw MRR: 0.149
Test - Best Filt MRR: 0.247
Test - Best Raw MR: 526.44904
Test - Best Filt MR: 338.9463
Test - Best Raw Hits@1: 8.077%
Test - Best Filt Hits@1: 15.797%
Test - Best Raw Hits@3: 15.489%
Test - Best Filt Hits@3: 27.485%
Test - Best Raw Hits@5: 20.791%
Test - Best Filt Hits@5: 33.58%
Test - Best Raw Hits@10: 29.437%
Test - Best Filt Hits@10: 42.83%

I am quoting the WN18RR parameters and results from another email:

thank you! Here the parameter configuration and the final results for DistMult and ComplEx from Pasquale:

For ComplEx:

inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*.log
384
Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=200_loss=hinge_margin=2_model=ComplEx_similarity=dot.log
Test - Best Raw MRR: 0.309
Test - Best Filt MRR: 0.444
Test - Best Raw MR: 5274.99697
Test - Best Filt MR: 5261.30121
Test - Best Raw Hits@1: 21.171%
Test - Best Filt Hits@1: 41.114%
Test - Best Raw Hits@3: 38.21%
Test - Best Filt Hits@3: 45.836%
Test - Best Raw Hits@5: 43.539%
Test - Best Filt Hits@5: 47.846%
Test - Best Raw Hits@10: 47.336%
Test - Best Filt Hits@10: 50.734%

These parameter settings should reproduce the results in the paper.

0532yangyang commented 6 years ago

Hi, could you please share the detailed setting of DistMult on FB15k-237 and WN18RR ?

TimDettmers commented 6 years ago

The following are parameters for DistMult on WN18RR:

For DistMult:
 inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*DistMult*.log
192
Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=500_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.301
Test - Best Filt MRR: 0.425
Test - Best Raw MR: 5124.58998
Test - Best Filt MR: 5110.78287
Test - Best Raw Hits@1: 20.453%
Test - Best Filt Hits@1: 38.864%
Test - Best Raw Hits@3: 37.077%
Test - Best Filt Hits@3: 43.874%
Test - Best Raw Hits@5: 41.895%
Test - Best Filt Hits@5: 45.82%
Test - Best Raw Hits@10: 46.011%
Test - Best Filt Hits@10: 49.059%

And these are parameters on FB15k-237:

237_v1.embedding_size=200_epochs=1000_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.159
Test - Best Filt MRR: 0.241
Test - Best Raw MR: 441.64219
Test - Best Filt MR: 254.15069
Test - Best Raw Hits@1: 9.096%
Test - Best Filt Hits@1: 15.523%
Test - Best Raw Hits@3: 16.598%
Test - Best Filt Hits@3: 26.27%
Test - Best Raw Hits@5: 21.599%
Test - Best Filt Hits@5: 32.486%
Test - Best Raw Hits@10: 30.104%
Test - Best Filt Hits@10: 41.896%

jrieke commented 6 years ago

Hi Tim, first of all thanks for the great repo! One additional question on the DistMult settings: Which dropout value did you use? Also, could you replicate these results with the code in your repo, or only with 1-1 scoring and the inferbeddings code?

TimDettmers commented 6 years ago

For the inferbeddings code we did not use any dropout. We also did not use L2 regularization. However, we do use re-normalization to L2 norm <= 1 for the embedding vectors after each weight update which has a regularizing effect. All this is for 1-1 scoring.

For 1-N scoring, the results differ between datasets. For some, it improves performance, for some others not. I think for WN18RR it decreases performance. So for WN18RR 1-1 scoring seems to do better (or is it the margin loss, or the re-normalization? We did not test this causally).

Hope it helps. Let me know if you have any more questions.

TimDettmers / ConvE

Detailed parameter settings of ComplEx on WN18RR and FB15237? #22