awslabs / dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
https://dglke.dgl.ai/doc/
Apache License 2.0
1.25k stars 193 forks source link

Could DGL-KE be applied to undirected graph? #79

Open cfangplus opened 4 years ago

cfangplus commented 4 years ago

hi,

This is a great project and recently our team have tried to run some demo programs. Now I have a question, TransE, for example, is mostly used for Heterogeneous Graphs which contain different ralations, but could this TransE applied to undirect graph? Undirected Graph only contain one kind of relation and I think it's a special case of Heterogeneous Graphs, so actually I believe that TransE could used here, but how about the performance compared with those embedding algorithm used only for undirected graph(node2vec for instance)? As TransE would create embedding vector for the relation though there is only one kind of relation for undirected graph and this relation embedding is indeed useless for undirect graph. thx.

zheng-da commented 4 years ago

For DGL, an undirected graph is stored as a bi-directional graph. If there is an edge from A to B, there will be an edge from B to A. In DGL-KE, we use DGL to store DGL to store knowledge graphs. If your graph is undirected, I think you can store it as a bi-directional graph. You can store a triplet (A, r, B) and (B, r', A) for every edge. I think you can create a separate relation for the reverse edge.

cfangplus commented 4 years ago

The reason why to change the undirected graph to stored as bi-directional graph is that the graph now contains two types of relation, concretely, inverse relation, right?

zheng-da commented 4 years ago

I think so. But it is also up to the model. Some of the KGE models, such as DistMult, are symmetric. For the symmetric models, I guess one direction is enough.

cfangplus commented 4 years ago

We know that DGL-KE have provided knowledge graph embedding algorithm like TransE and many others. I see that PyG and some other similar system like GraphVite both have implements of random walk node embedding algorithm like Node2Vec, DeepWalk, LINE, etc, does DLG-KE plan to develop this feature?

zheng-da commented 4 years ago

Yes, we plan to implement them. We have a prototype of DeepWalk and Metapath2vec, but haven't fully optimized them yet.

cfangplus commented 4 years ago

That's great. Now we have two choices of embedding models, one is node embedding like DeepWalk, Node2Vec, while another knowledage graph embedding like TransE, DistMult. As the first question of this issue we talked, actually, KGE models could be used for homogeneous graph where DeepWalk, Node2Vec is applied. But this does not mean that KGE models could replace those node embedding models, right? Otherwise, there is no need to develop and optimize DeepWalk. I still don't know the performance differences between the two, do you ever consider this problem. As now u developed the two models, please provide some advice on how to choose the two while user of DGL-KE tried to do node embedding with homogeneous graph, thx.

zheng-da commented 4 years ago

This is an interesting question. We never compare the two kinds of algorithms. KGE is supervised by link prediction while DeepWalk is random walk sequence. When we apply KGE on a homogeneous graph, it should be similar to simple matrix factorization. DeepWalk should outperform matrix factorization, such as SVD?