DeepGraphLearning / RNNLogic

120 stars 25 forks source link

A glitch in WN18RR data #13

Open navdeepkjohal opened 2 years ago

navdeepkjohal commented 2 years ago

Dear Authors,

I found a little glitch in the WN18RR data updated by you. Although the data/wn18rr/entity.dict mentions 40943 entities, the actual entities which are a part of train.txt files are only 40559. Hence there are 40943-40559 = 384 entities that do not occur in the train.txt data but only are a part of the valid.txt and test.txt data and the model is doing zero-shot inference for these entities at the validation/test time, which might have adversarially affected the performance of your model. For instance, entity id: 14501545, does not occur in train.txt although it has been mentioned in the entities.dict file.

Apologies if I missed something, or my interpretation is wrong.

Best Navdeep

mnqu commented 2 years ago

Thanks for the information!

We also noticed this point, where some entities only appeared in the valid and test sets. But all previous works used this dataset for evaluation, and thus we also used this dataset for fair comparison, also this dataset was not ideal.

Hope this helps.