what is the meaning of 'use_masks' and 'no_masks'? - Githubissues

cynricfu / MAGNN

Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

398 stars 69 forks source link

what is the meaning of 'use_masks' and 'no_masks'? #28

Closed hhmy27 closed 2 years ago

hhmy27 commented 3 years ago

Thanks for opening the source code

I'm confused about the code in run_LastFM.py

https://github.com/cynricfu/MAGNN/blob/b8557f58ae04a7fe3e7a9fde64ea87c81b331efe/run_LastFM.py#L21-L23

what is the meaning of 'use_masks' and 'no_masks'?

it's parameter in parse_adjlist_LastFM https://github.com/cynricfu/MAGNN/blob/b8557f58ae04a7fe3e7a9fde64ea87c81b331efe/utils/tools.py#L129

Looking forward to your recovery

cynricfu commented 3 years ago

use_masks is for solving the data leakage problem as introduced in this paper. When constructing the subgraph for generating the target nodes (e.g., user A and artist B) embeddings, we need to remove the edge between A and B to avoid leaking prediction label to the training input. So use_masks is for indicating which metapath we should be careful about to avoid this problem, that is UAU, UATAU, AUA, AUUA metapaths in LastFM.

no_masks is just indicating do not try to avoid this issue.

CoraHub commented 2 years ago

Hi, I have a question on how to generate train_val_test_idx.npz in prepeocess_LastFM.

cynricfu commented 2 years ago

Hi, I have a question on how to generate train_val_test_idx.npz in prepeocess_LastFM.

Basically, train_val_test_idx.npz is generated by randomly sampling the user-artist pairs, with the rule that each user/artist should have one incident edge present in the training set.