Quantitative results - Githubissues

phucty commented 4 years ago

Thank you very much for your time to update README (https://github.com/SmartDataAnalytics/LiteralE/issues/3), I can run your code now.

But I got a problem that the final quantitative results (using literal) is lower than original one (e.g. ConvE - https://github.com/TimDettmers/ConvE). Additionally, it seems that the results in Table 3 (LiteralE paper) are quite low than other papers such as [1] [2]. I am sorry for this inconveniences, but may i miss something?

The followings are result when I runned your scripts. (The detail logs are in the attachment) File	Model	FB15k-237 - MRR
main_literal	ComplEx	0.2699
main_literal	DistMult	0.3154
main_literal	ConvE	0.2980
main_literal	DistMult_text	0.3143

Phuc

logs_LiteralE.txt

[1] Rudolf Kadlec et al. Knowledge Base Completion: Baselines Strike Back. 2017 [2] Lacroix, Timothée et al. Canonical Tensor Decomposition for Knowledge Base Completion, 2018.

wiseodd commented 4 years ago

Hi again,

we use early stopping, so we usually don't look at the test results of the last epoch. This is mentioned in our paper. So instead, we look at the validation MRR just before it goes lower significantly during training (small difference is fine since the training is stochastic). Once we've found this, we can look at the corresponding test MRR, and report it.

For lower results to ConvE papers: This might be due to hyperparameters search that they do and the fact that the ConvE code in this repo is quite old (the ConvE's authors might have made many changes since then). We believe this doesn't really matter since we're not interested in getting state of the art result. We are rather interested in how big improvements incorporating literals can give.

By the way, just to make it clear: Our code is written on top of ConvE codebase without changing anything other than extending the models with LiteralE. So naturally, if there were issues with ConvE code, we will also be impacted.

phucty commented 4 years ago

Hi, Thank you for your answer, I got your purposes on finding degree of improvements when incorporating literals.

Regarding early stopping, could you please let me know how the "lower significantly" threshold? I recheck the paper but I can not find this.

Thank you very much. Phuc

SmartDataAnalytics / LiteralE

Quantitative results #4