Can not reproduce results in the paper for WN18RR dataset

samehkamaleldin commented 6 years ago

I have tried the generic command for reproducing results in the paper for WN18RR dataset, but it could not reproduce MRR reported in the paper, I managed only to get 0.42.

Which hyperparameters can reproduce the 0.46 MRR reported in the paper?

TimDettmers commented 6 years ago

There has been a major bug in the code-base found by Victoria Lin. I fixed the issue here: d830ddf968ad116ec7466ee18c122cb29e248415

I have not reproduced all of the results, but in general, they are the same or similar. What I have gotten so far on WN18RR is:

Current code: MRR: 0.43, HIts@10: 0.51, Hits@3: 0.44, Hits@1: 0.39 Paper: MRR: 0.46, Hits@10: 0.48, Hits@3: 0.43, Hits@1: 0.39

The difference is now: -0.03, +0.03, +0.01, +0.0

I would say that these scores are even slightly better than we got before, but of course, the MRR is lower. I think I would be able to replicate the results if I would search for longer (the score increased steadily), but I currently have no time to do this. I updated the results in the README.md with the current scores. If you get better scores please let me know.

I got these scores with:

1 2018-04-20 10:28:19.678540 (INFO): Set parameter dropout to 0.3                                                                                                                                       
2 2018-04-20 10:28:19.678680 (INFO): Set parameter input_dropout to 0.3
3 2018-04-20 10:28:19.678722 (INFO): Set parameter feature_map_dropout to 0.2
4 2018-04-20 10:28:19.678735 (INFO): Set parameter dataset to WN18RR
5 2018-04-20 10:28:19.678747 (INFO): Set parameter learning_rate to 0.003
6 2018-04-20 10:28:19.678758 (INFO): Set parameter learning_rate_decay to 0.995

TimDettmers commented 6 years ago

I have run a WN18RR network with the same parameters for a bit longer, I got these results:

MR 4521, MRR 0.434 Hits@10: 0.507, Hits@3: 0.448, Hits@1: 0.397

I will run a wider grid search and see if I can improve the MRR.

villmow commented 6 years ago

There has been a major bug in the code-base found by Victoria Lin. I fixed the issue here: d830ddf

Could you explain what kind of bug?

TimDettmers commented 6 years ago

Please see #18 for a detailed explanation of the bug and ongoing re-evaluation of ConvE.

TimDettmers / ConvE

Can not reproduce results in the paper for WN18RR dataset #15