In function generate_neighbors, there is a process to lookup encode node id from vocab for all nodes in the graph. The vocab is generated from all random walks.
However, there are chances to miss some node in all random walks and will cause KeyError in ix = vocab[x].index or iy = vocab[y].index. Imaging that the metapath walk schema is user-item-user, one user has 10 items, while the walk num per node is 5. There are chances that an item neighbor of this user never be chosen during the walking process. And this particular item has this one user neighbor only. This results in this item not shown in vocab but shown in the graph and will cause KeyError when lookup encoded id.
I think the encoding process should be done upon the graph rather than the random walks. And the encoding order should depend on the degree of the node.
In function
generate_neighbors
, there is a process to lookup encode node id fromvocab
for all nodes in the graph. Thevocab
is generated from all random walks.However, there are chances to miss some node in all random walks and will cause KeyError in
ix = vocab[x].index
oriy = vocab[y].index
. Imaging that the metapath walk schema isuser-item-user
, one user has 10 items, while the walk num per node is 5. There are chances that an item neighbor of this user never be chosen during the walking process. And this particular item has this one user neighbor only. This results in this item not shown invocab
but shown in the graph and will cause KeyError when lookup encoded id.I think the encoding process should be done upon the graph rather than the random walks. And the encoding order should depend on the degree of the node.