Closed shhs29 closed 1 year ago
Dear Shweta,
Yes. We have fixed this problem.
We successfully reproduce the results on synthetic datasets with the new code. As the real-world datasets use fixed splits, other results will not be affected.
Sincerely, Xiyuan Wang
Hi Xiyuan,
Thanks a lot for the quick reply. Are these new results available anywhere ?
I had another question regarding fixed dataset split for real-world datasets. I understand SubGNN uses fixed split. However, is there any reason as to why it is kept fixed ?
Thanks and Regards, Shweta Ann Jacob
Dear Shweta,
New results and those in the GLASS paper are equal within the error range.
I don't know the exact reason for the fixed split. Reproducibility is one possible motivation. Moreover, the split may have specific meanings in real-world settings.
Sincerely, Xiyuan Wang
Hi Xiyuan,
Thanks a lot for your insight.
I was taking a look at the new change in GLASS for ensuring random splits. Currently, seed affects the split function. However, shouldn't the seed affect the load dataset function since this step decides the train, val and test masks ? As per my understanding, current implementation does not change these masks. Please correct me if I am wrong.
Thanks in advance, Shweta Ann Jacob
Dear Shweta,
You are right. I reload the dataset and generate new masks now (see line 84 in GLASSTest.py). New results and those in the GLASS paper are still equal within the error range.
Sincerely, Xiyuan Wang
Hi Xiyuan,
Thanks a lot for the update.
Closing this issue as it is resolved.
Hi,
In the GLASS paper, the experiments are repeated for 10 runs with random seeds and the micro-F1 score for each dataset is the average of those 10 runs. I was wondering if for each seed, a different dataset split is created. As per my understanding of the code, a single dataset split is used across all seeds. Could you confirm if my assumption is right ?
Thanks in advance, Shweta Ann Jacob