acbull / pyHGT

Code for "Heterogeneous Graph Transformer" (WWW'20), which is based on pytorch_geometric
MIT License
786 stars 162 forks source link

Implementation for HAN baseline #18

Open lingfanyu opened 4 years ago

lingfanyu commented 4 years ago

Hi,

In the paper, you mentioned that you re-implemented HAN on OAG dataset. Could you share your HAN implementation as well?

From my understanding, HAN defines neighborhood based on user-provided metapaths and it's unclear to me how to scale HAN to large heterogeneous graph like OAG. Can you elaborate on that?

Thanks!

lingfanyu commented 4 years ago

Hi HGT author,

Is there any update about this issue?

Thanks!

acbull commented 4 years ago

Our implementation is a little bit different from the original HAN paper. We just add the meta-path when we create the graph dataset, and just treat them as another type of edges. The model architecture is the same as their original paper. Since we can still use sub-graph sampling to train over the large-scale graphs, the model can still be scaled to our dataset.

Due to a slightly different pipeline (as I stated above) with our HGT model, we didn't prepare HAN on this Github project. Sorry about that

lingfanyu commented 4 years ago

Thanks for your reply! I have two follow-up questions though I totally understand they are more related to HAN instead of HGT, and I really appreciate your time in helping me better understand the comparison between HGT and your baselines.

Can you also share the meta-paths you selected for HAN baseline on OAG dataset when you create the graph? OAG has about 15 different edge types (not including the reverse edge types), and I think it's hard to manually decide what are meaningful metapaths for HAN.

Another problem with pre-building a graph in which each edge represents some metapath-reachable neighbor is that it makes the graph a lot denser. For example, if a graph has 1 author node and 100 papers written by this one author, then if you build a graph for metapath paper-author-paper, it creates a fully connected graph. In other words, the graph now becomes 100 times denser. So I think this approach does not scale to large dataset like OAG. Even if it's done off-time as a preprocessing step, it can easily exhaust CPU memory when the graph is huge. Do you have any comments on how to address this issue?