How to calculate similarity between the two graphs?

dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

http://dgl.ai

Apache License 2.0

13.55k stars 3.02k forks source link

How to calculate similarity between the two graphs? #4210

Closed smith-co closed 2 years ago

smith-co commented 2 years ago

I have to learn similarity between graphs using deep learning. I have many samples (~500k) of graphs. Graphs have ~5000 nodes and ~4000 edges in the average.

How can I compute similarity score between two graphs? I am thinking:

convert graphs into vectors using Graph2Vec embedding
then compare them using various similarity calculating techniques like cosine similarity.

I would really appreciate if I can get some feedback whether this is the correct way to approach this problem or not.

jermainewang commented 2 years ago

Your approach looks right to me but you may want to think about how to train the Graph2Vec model. Specifically, what are the supervision signals to use? Alternatively, you could also consider some non-learning approaches such as graph edit distance.

smith-co commented 2 years ago

Thanks for your response @jermainewang. My dataset is labeled and I know what are the positive and negatives pairs.

I have not completely understood how to achive "what are the supervision signals to use"? Would appreciate your feedback.

jermainewang commented 2 years ago

Then you could design loss to make the graph embeddings of positive pairs closer and negative pairs farther. At a high level, the training algorithm show look like this:

For each pair of graphs g1 and g2,
Feed g1 and g2 to your GNN model to get graph level embedding e1 and e2.
Compute loss(e1, e2, label), where label is 1 if (g1, g2) is positive and 0 if is negative.
Optimize the model and iterate.

To speed up the training, you could batch multiple graphs together. To get graph level embedding, DGL provides a bunch of Global Pooling Layer. For the loss function, there are many options, e.g., ranking loss, marginal loss, etc.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] commented 2 years ago

This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.