Closed JuliaGast closed 3 months ago
Negative sampling for evaluation, i.e. for computing test MRRs is typically NOT done in static KG link prediction evaluation!
See info e.g. here:
Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings
and here ([11] from above):
Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques
Different commonly used approaches to train KGE models, which differ mainly in the way negative examples are generated
For training with negative samples, different strategies exist:
Due to the fact that it is not done in static KG completion, I opt to not use negative sampling for evaluation of TKG Forecasting, but instead compute the MRR based on scores for all entities in the KG.
In our datasets, we do not have more entities or significantly more test triples, and thus introducing negative sampling for evaluation of TKG models is not well motivated.
(https://openreview.net/pdf?id=BkxSmlBFvr)
(ultra)
identify node types and ask the model only to predict MRR for nodes of the same type as the true answer? would that be a fair evaluation?
not really; if models predict wrong node types (which happens) this would not be considered at all. also, for many datasets node types are not given.
suggestions of how to potentially choose node for negative samples: combination of:
What about negative sampling during training?
Update: After meeting with Michael on April 11th we decided to NOT do negative sampling, i.e. to do the 1-vs-all strategy.