Question regarding dataset split

Xi-yuanWang / GLASS

GLASS: GNN with Labeling Tricks for Subgraph Representation Learning

30 stars 6 forks source link

Question regarding dataset split #6

Closed shhs29 closed 1 year ago

shhs29 commented 1 year ago

Hi,

In the GLASS paper, the experiments are repeated for 10 runs with random seeds and the micro-F1 score for each dataset is the average of those 10 runs. I was wondering if for each seed, a different dataset split is created. As per my understanding of the code, a single dataset split is used across all seeds. Could you confirm if my assumption is right ?

Thanks in advance, Shweta Ann Jacob

Xi-yuanWang commented 1 year ago

Dear Shweta,

Yes. We have fixed this problem.

We successfully reproduce the results on synthetic datasets with the new code. As the real-world datasets use fixed splits, other results will not be affected.

Sincerely, Xiyuan Wang

shhs29 commented 1 year ago

Hi Xiyuan,

Thanks a lot for the quick reply. Are these new results available anywhere ?

I had another question regarding fixed dataset split for real-world datasets. I understand SubGNN uses fixed split. However, is there any reason as to why it is kept fixed ?

Thanks and Regards, Shweta Ann Jacob

Xi-yuanWang commented 1 year ago

Dear Shweta,

New results and those in the GLASS paper are equal within the error range.

I don't know the exact reason for the fixed split. Reproducibility is one possible motivation. Moreover, the split may have specific meanings in real-world settings.

Sincerely, Xiyuan Wang

shhs29 commented 1 year ago

Hi Xiyuan,

Thanks a lot for your insight.

I was taking a look at the new change in GLASS for ensuring random splits. Currently, seed affects the split function. However, shouldn't the seed affect the load dataset function since this step decides the train, val and test masks ? As per my understanding, current implementation does not change these masks. Please correct me if I am wrong.

Thanks in advance, Shweta Ann Jacob

Xi-yuanWang commented 1 year ago

Dear Shweta,

You are right. I reload the dataset and generate new masks now (see line 84 in GLASSTest.py). New results and those in the GLASS paper are still equal within the error range.

Sincerely, Xiyuan Wang

shhs29 commented 1 year ago

Hi Xiyuan,

Thanks a lot for the update.

Closing this issue as it is resolved.