DeepGraphLearning / KnowledgeGraphEmbedding

MIT License
1.24k stars 264 forks source link

Why do you separate negative head samples and negative tail samples? #34

Closed renli1024 closed 4 years ago

renli1024 commented 4 years ago

Thanks first for such a good job.

I observe in code that you implement two data iterators named train_dataloader_head and train_dataloader_tail, which respectively generate negative head samples and negative tail samples. And when training, these two iterators are alternatively fed into the model. If what I understand above is right, the model will train one positive sample twice, respectively for neg head and neg tail samples. I want to know why you do negative sampling this way, instead train the neg head and neg tail samples together and back propagate one positive sample once, which I think is a more intuitive way?

Thanks a lot for your reply.

Edward-Sun commented 4 years ago

Hi, thanks for the good question.

This separation is merely an easier way to implement negative sampling. What you proposed is also a valid way for negative sampling, but I haven't tried it. Therefore, maybe the method you described will have higher performance than the current implementation and I believe it's worth a try.

kiranramnath007 commented 4 years ago

Have you observed any differences in implementing only one compared to the other? The Local Closed-World Assumption (LCWA) only holds for what you call the tail-batch method of generating false triples, and the head-batch method may not be a justified way of generating false triples. (It may be that that problem is taken care of by the self-adversarial negative sampling weight attached to each sample though)

Edward-Sun commented 4 years ago

Hi, thanks for the good question. We use both head-batch and tail-batch only because this is how the MRR calculated. If MRR is only calculated for the tail-batch (following the LCWA), then I think it's reasonable to only train the model on the tail-batch.

kiranramnath007 commented 4 years ago

Thanks for the interesting detail. However, your recent ACL 2020 publication on re-evaluating baselines seems to refer to only tail-batch corrupted triples for ranking in the notation. Is that correct? So for that did you use only tail-batch corruption during training?

Edward-Sun commented 4 years ago

No, although we might only describe the case for tail-batch, both head-batch and tail-batch are evaluated in that paper. Because head-batch is just tail-batch with the reversed relations.