amazon-science / page-link-path-based-gnn-explanation

MIT No Attribution
17 stars 6 forks source link

Quantitative evaluation #4

Open ElioShyti opened 1 year ago

ElioShyti commented 1 year ago

Hi, I'm trying PaGE-Link on another dataset (movielens). I ran the train_linkpred.py and the pagelink.py codes without big problems, but i have some issues with the eval_explanation.py since it requires as input 2 extra data structures:

1) movielens_pred_pair_to_edge_labels 2) movielens_pred_pair_to_path_labels.

I'd like to know how the aug_citation_pred_pair_to_edge_labels and aug_citation_pred_pair_to_path_labels files in /datasets are constructed so that i can replicate them with my data and run the eval_explantion.py file.

Also, I saw that the 2 structures are loaded with the load_dataset function in the data_processing.py, but these files are used only in the eval_explantion.py. I proceeded putting as a comment the lines in which the structures are loaded for running the train_linkpred.py and the pagelink.py since the same function is used for loading the data even in these.

As a last thing I'd like to know if there are some src_nodes (author) and tgt_nodes (paper) that have both the edges 'likes' and 'writes' in your graph. It seems not to me but i wanted a confirmation.

Thank you for the attention, ES

ShichangZh commented 1 year ago

Thank you for your interest in our work and sorry for the late reply. Please find my answers to your questions below.

  1. The aug_citation_pred_pair_to_edge_labels and aug_citation_pred_pair_to_path_labels are the ground truth labels for evaluating the explanations. When you are dealing with a new dataset, you may get the corresponding labels by gathering domain knowledge, or having humans label them, or using pre-defined rules to generate them (our case). The first and the second approaches are actually better if you are able to perform them. In our case, we proceed with the third and defined two rules to generate these labels, i.e., concise and informative (please refer to section 6.1 for more details). If you find this part useful, I will see whether I can update the corresponding code to the repo as soon as possible.

  2. For the load_dataset function, your suggestion is great, which will help to save unnecessary IO. I will definitely consider make such changes in our next update.

  3. For graph edges, no we don't have both "likes" and "writes" between the same pair of author and paper, so your understanding is correct. This is because we do not want to predict an existing edge between two nodes (even in different edge types), so we explicitly avoid creating those "likes" edges for prediction. In practice, you can think of the "likes" relation as an recommendation of new papers for authors to read, so there is no need to recommend their own papers.

ElioShyti commented 1 year ago

Thank you for your answer, I think it could be useful if you can provide an example of how you generated theese files for a better comprehension