efficience about data augmentation on Large graph

junkangwu commented 3 years ago

Hi, I'm interested in self-supervised learning on the graph and GraphCL is an excellent work that combining self-supervised learning with augmentation. However, I'm also curious about its usage in large graphs such as knowledge graph and so on. May I ask you about the suggestion on the augmentation operation on the knowledge graph? Because in the way of GraphCL, it needs 4 GNN encoders at the same time per epoch. Thanks a lot in advance~

yyou1996 commented 3 years ago

Hi @Wjk666,

Thanks for your interest. For NodeDrop, EdgePert and AttrMask augmentations, complexity should be O(K) (where K is the augmented number, e.g. the nodes dropped in NodeDrop), thus it is OK to scale up on larger graphs. For Subgraph the complexity might be higher since there is a while loop in it, so when comes to too large graph, I think you might consider updating the implementation a little bit.

Because in the way of GraphCL, it needs 4 GNN encoders at the same time per epoch. Actually in GraphCL it only requires 1 GNN encoder, propagate for twice (2 augmented views). Things might be some different when comes to your case.

junkangwu commented 3 years ago

@yyou1996 Thanks a lot for your advice and suggestion, I think I have to check my code.

Shen-Lab / GraphCL

efficience about data augmentation on Large graph #3