DeepGraphLearning / KnowledgeGraphEmbedding

MIT License
1.24k stars 264 forks source link

A question about the data range of negative sampling #35

Closed renli1024 closed 4 years ago

renli1024 commented 4 years ago

Hi, thanks for such a good job first!

I observe when training, you generate negative samples based on train set, so for triples only appearing in valid or test set, the model will treat them negative and these "false negative" samples will influence the model performance when evaluating. From my opinion, maybe the valid set should be introduced for negative sampling in training?

Thanks for your interpretation.

Edward-Sun commented 4 years ago

Hi Ren,

If the valid set is used for negative sampling in training, this is similar to using valid set as positive samples. This will make an unfair comparison to previous work, which only uses the training set for positive samples.

So we don't use the valid set even in negative sampling.