THUDM / GATNE

Source code and dataset for KDD 2019 paper "Representation Learning for Attributed Multiplex Heterogeneous Network"
MIT License
527 stars 141 forks source link

Chance to have KeyError in function `generate_neighbors` #98

Open Wang-Yu-Qing opened 3 years ago

Wang-Yu-Qing commented 3 years ago

In function generate_neighbors, there is a process to lookup encode node id from vocab for all nodes in the graph. The vocab is generated from all random walks.

However, there are chances to miss some node in all random walks and will cause KeyError in ix = vocab[x].index or iy = vocab[y].index. Imaging that the metapath walk schema is user-item-user, one user has 10 items, while the walk num per node is 5. There are chances that an item neighbor of this user never be chosen during the walking process. And this particular item has this one user neighbor only. This results in this item not shown in vocab but shown in the graph and will cause KeyError when lookup encoded id.

I think the encoding process should be done upon the graph rather than the random walks. And the encoding order should depend on the degree of the node.