Closed boyang-nlp closed 6 years ago
Note that in our paper, ComplEx refers to ComplEx with traditional 1-1 scoring. To produce the results for ComplEx with 1-1 scoring, we used the software from my co-author: https://github.com/uclmr/inferbeddings
The following shows an excerpt from an email that also inquires for parameter settings for ComplEx on FB15k-237:
thank you for your interest in our work. Pasquale (CC) work on this with his code which you can find here: https://github.com/uclmr/inferbeddings
Here are some example results, they are not the best that we got through. Batch size for Pasquale's code is the entire training set split into 10 pieces.
Here are also hyperparameters for ComplEx. These results are actually a bit better than we report in the paper, so we might have missed these results. We updated our paper accordingly.
$ ~/workspace/inferbeddings/tools/parse_results_mrr_filtered.sh *ComplEx* 192 Best MRR, Filt: ucl_fb15k-237_v1.embedding_size=200_epochs=1000_loss=pairwise_hinge_margin=5_model=ComplEx_similarity=dot.log Test - Best Raw MRR: 0.149 Test - Best Filt MRR: 0.247 Test - Best Raw MR: 526.44904 Test - Best Filt MR: 338.9463 Test - Best Raw Hits@1: 8.077% Test - Best Filt Hits@1: 15.797% Test - Best Raw Hits@3: 15.489% Test - Best Filt Hits@3: 27.485% Test - Best Raw Hits@5: 20.791% Test - Best Filt Hits@5: 33.58% Test - Best Raw Hits@10: 29.437% Test - Best Filt Hits@10: 42.83%
I am quoting the WN18RR parameters and results from another email:
thank you! Here the parameter configuration and the final results for DistMult and ComplEx from Pasquale:
For ComplEx:
inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*.log 384 Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=200_loss=hinge_margin=2_model=ComplEx_similarity=dot.log Test - Best Raw MRR: 0.309 Test - Best Filt MRR: 0.444 Test - Best Raw MR: 5274.99697 Test - Best Filt MR: 5261.30121 Test - Best Raw Hits@1: 21.171% Test - Best Filt Hits@1: 41.114% Test - Best Raw Hits@3: 38.21% Test - Best Filt Hits@3: 45.836% Test - Best Raw Hits@5: 43.539% Test - Best Filt Hits@5: 47.846% Test - Best Raw Hits@10: 47.336% Test - Best Filt Hits@10: 50.734%
These parameter settings should reproduce the results in the paper.
Hi, could you please share the detailed setting of DistMult on FB15k-237 and WN18RR ?
The following are parameters for DistMult on WN18RR:
For DistMult:
inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*DistMult*.log
192
Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=500_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.301
Test - Best Filt MRR: 0.425
Test - Best Raw MR: 5124.58998
Test - Best Filt MR: 5110.78287
Test - Best Raw Hits@1: 20.453%
Test - Best Filt Hits@1: 38.864%
Test - Best Raw Hits@3: 37.077%
Test - Best Filt Hits@3: 43.874%
Test - Best Raw Hits@5: 41.895%
Test - Best Filt Hits@5: 45.82%
Test - Best Raw Hits@10: 46.011%
Test - Best Filt Hits@10: 49.059%
And these are parameters on FB15k-237:
237_v1.embedding_size=200_epochs=1000_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.159
Test - Best Filt MRR: 0.241
Test - Best Raw MR: 441.64219
Test - Best Filt MR: 254.15069
Test - Best Raw Hits@1: 9.096%
Test - Best Filt Hits@1: 15.523%
Test - Best Raw Hits@3: 16.598%
Test - Best Filt Hits@3: 26.27%
Test - Best Raw Hits@5: 21.599%
Test - Best Filt Hits@5: 32.486%
Test - Best Raw Hits@10: 30.104%
Test - Best Filt Hits@10: 41.896%
Hi Tim, first of all thanks for the great repo! One additional question on the DistMult settings: Which dropout value did you use? Also, could you replicate these results with the code in your repo, or only with 1-1 scoring and the inferbeddings code?
For the inferbeddings code we did not use any dropout. We also did not use L2 regularization. However, we do use re-normalization to L2 norm <= 1
for the embedding vectors after each weight update which has a regularizing effect. All this is for 1-1 scoring.
For 1-N scoring, the results differ between datasets. For some, it improves performance, for some others not. I think for WN18RR it decreases performance. So for WN18RR 1-1 scoring seems to do better (or is it the margin loss, or the re-normalization? We did not test this causally).
Hope it helps. Let me know if you have any more questions.
Hi, I have failed to reproduce results of ComplEx on WN18RR and FB15K237 reported in ConvE paper. I use an implementation of myself and it can reproduce results of FB15K and WN18 correctly. Could you please tell me the optimal parameter settings of ComplEx you implemented on these two datasets?