dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.36k stars 3k forks source link

Reverse_types for as_edge_prediction_sampler not properly documentated #4453

Open as51340 opened 2 years ago

as51340 commented 2 years ago

📚 Documentation

https://docs.dgl.ai/en/0.8.x/generated/dgl.dataloading.as_edge_prediction_sampler.html

This tutorial doesn't mention that when passing reverse_etypes dict, it is assumed that edges of the reverse types have the same id. (e.g clicks and clicked-by) I found this info here: https://discuss.dgl.ai/t/exclude-eids-in-edgedataloader/1552

Rhett-Ying commented 2 years ago

I think what user needs to make sure is the key-value pair should be the same as corresponding value-key pair in reverse_etypes. This is intuitive. does it confuse you?

as51340 commented 2 years ago

Yes, but imagine that you are loading a dataset in which you already have bidirected edges. In that case, the reverse_types flag doesn't work because corresponding edges will not have the same id.

Rhett-Ying commented 2 years ago

could you share a minimum and runnable snippet to elaborate this?

as51340 commented 2 years ago

I think what user needs to make sure is the key-value pair should be the same as corresponding value-key pair in reverse_etypes. This is intuitive. does it confuse you?

No, I am fine with that but here https://discuss.dgl.ai/t/exclude-eids-in-edgedataloader/1552 it also says that edge -> and its reverse edge <- should have the same id(in a heterogeneous graph with different edge types) and I think this should be included in the documentation.

as51340 commented 2 years ago

could you share a minimum and runnable snippet to elaborate this

I don't have runnable but I can give you an example which I encountered. Let's say that you are loading a graph from a graph database in which nodes are connected with edges in both directions. The most logical way to convert a graph from graph databases to DGLGraph is to iterate over nodes and their out_edges or in_edges. It is really hard(or more to say computationally inefficient) to search for each edge you encounter, its reverse edge pair and hence construct a graph in which edge and reverse_edge have the same id.

Rhett-Ying commented 2 years ago

Let's add clarity into doc. It's great if you can help do this.