Feature request : link prediction script

amazon-science / tgl

Apache License 2.0

192 stars 31 forks source link

Feature request : link prediction script #2

Open moudheus opened 2 years ago

moudheus commented 2 years ago

Currently TGL only provides training scripts. I would like to predict the top K most likely future links after training.

Example:

For each source node in the training data, what are the top 100 destination nodes, given a future timestamp and a set of edge features?

Alternatively, I would appreciate documentation on how to achieve this on my own. I believe it is doable by generating the list of candidate edges as a test set, and then extract the scores from the evaluation function.

Thank you.

tedzhouhk commented 2 years ago

Thank you for your interest in our work!

It's not hard to achieve this. Here's the two places you should be looking at:

root_nodes: You want each batch to contains all nodes since you want to rank the top destination nodes. If the number of nodes are too large, this might need to be done in a minibatch way.
EdgePredictor: You need to compute the edge probabilities for all destination nodes and return the top x nodes.

Lastly, I want to point out that under the current link prediction setup, the top 100 destination nodes for any source node would be the same at the same time, as the edge probability is calculated by adding two "edge scores" from the source and destination nodes. Another setup for GNN edge prediction is to directly learn a embedding for each node pair, which might better serve your need.

moudheus commented 2 years ago

Thank you, I will look into this!

moudheus commented 2 years ago

I implemented a predict script in the following way:

Before prediction, I run eval('train') on the trained model in order to update the memory
In the prediction, each batch contains one source node, and all possible destination nodes
Instead of implementing a new EdgePredictor, I reuse the existing one, but only get the first output (it is not very efficient but should be correct)

Could you please have a look at my implementation and tell me if anything seems wrong?

You can find it here: https://github.com/moudheus/tgl/blob/main/predict.py

I would be willing to do a cleaner version and send a pull request if I am on the right track.

Thanks!

tedzhouhk commented 2 years ago

Thanks for your contribution! Unfortunately I'm a little busy recently and will look into it probably next month.

tedzhouhk commented 2 years ago

I have also added (#4) a script to be able to use any number of negative samples during inference.

moudheus commented 2 years ago

Thanks for the message, will look into it.