DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System
https://graphvite.io
Apache License 2.0
1.22k stars 151 forks source link

Slow link prediction? #48

Closed ggerogiokas closed 4 years ago

ggerogiokas commented 4 years ago

Hi,

I have been running some link prediction with a distmult model with the following settings:

metrics = app.link_prediction(file_name='test_cleaned.tsv', filter_files=['train_cleaned.tsv'], target="tail")

There are ~52 million filter triplets, and ~43000:

effective triplets: 43226 / 43226 effective filter triplets: 52388933 / 52388933 Memory is not enough for optimal prediction batch size. Use the maximal possible size instead.

It is taking over an hour and still not finishing, is that a problem with the prediction batch size?

Thanks, George

KiddoZhu commented 4 years ago

The convention of KG link prediction is to rank a triplet (h,r,t) along with (?, r, t) and (h, r, ?). So it takes O(|V||T|d) computaion for evaluation, where |V| is #entity of the training set, |T| is #triplet of the test set, d is the embedding dimension.

Roughly, 4 GPUs can deal with |V||T|d = 10^13 in an hour. You can subsample the test set by setting fast_mode=1000 in link prediction. This sets |T| to 1000. Generally, |T| of 1000 is sufficient for a stable evaluation.

ggerogiokas commented 4 years ago

Thanks for the info, I think I will go for 1000 in the future! To get per sample predictions I guess I should use entity_prediction() function?

KiddoZhu commented 4 years ago

Exactly.