TimDettmers / ConvE

Convolutional 2D Knowledge Graph Embeddings resources
MIT License
674 stars 163 forks source link

About WN18RR #6

Closed guolingbing closed 6 years ago

guolingbing commented 6 years ago

It seems that some enities in the testing set does not appear in the training set. So, about 210 number of triples in testing set are meaningless?

TimDettmers commented 6 years ago

Good catch. I just checked this and it is true. 212 entities in the test set do not occur in the training set. Since the dataset has already been used in some other papers I would not want to adjust it now. If everybody works with these unpredictable test cases it should even out and scores will be comparable (albeit being low). I will add a comment about this in the README. Thank you.

xptree commented 6 years ago

How do you deal with triplets whose entities do not appear in training set during evaluation? Simply ignore them or assign them a specific score, say 0. Thank you.

TimDettmers commented 6 years ago

Just treat them like any other triple. The model will probably not be able to rank them correctly (I would expect a random rank), but that is no issue as long as everybody evaluates those triple in that way. Note that, although not much better, random ranks are better than to assign zero scores. If you having problems with the triples not being in the vocabulary (embedding matrix) then include test set triples in the vocabulary — this is how I deal with the issue in this repo.

xptree commented 6 years ago

Thanks!

It seems that the results are still depend on how you assign random ranks (how you choose random seeds), although the dependence may be insignificant.

TimDettmers commented 6 years ago

Yes, I agree. It could induce bias, but it is unlikely I think. Thank you for this question, I think it will be helpful for others in the future.