Closed ggerogiokas closed 4 years ago
The convention of KG link prediction is to rank a triplet (h,r,t) along with (?, r, t) and (h, r, ?). So it takes O(|V||T|d) computaion for evaluation, where |V| is #entity of the training set, |T| is #triplet of the test set, d is the embedding dimension.
Roughly, 4 GPUs can deal with |V||T|d = 10^13 in an hour. You can subsample the test set by setting fast_mode=1000
in link prediction. This sets |T| to 1000. Generally, |T| of 1000 is sufficient for a stable evaluation.
Thanks for the info, I think I will go for 1000 in the future! To get per sample predictions I guess I should use entity_prediction() function?
Exactly.
Hi,
I have been running some link prediction with a distmult model with the following settings:
metrics = app.link_prediction(file_name='test_cleaned.tsv', filter_files=['train_cleaned.tsv'], target="tail")
There are ~52 million filter triplets, and ~43000:
It is taking over an hour and still not finishing, is that a problem with the prediction batch size?
Thanks, George