divelab / GOOD

GOOD: A Graph Out-of-Distribution Benchmark [NeurIPS 2022 Datasets and Benchmarks]
https://good.readthedocs.io/
GNU General Public License v3.0
180 stars 19 forks source link

Questions about the GOOD-motif dataset #19

Closed Artimislyy closed 9 months ago

Artimislyy commented 10 months ago

1.What is the difference between the id_val_dataset and val_dataset? 2.what does train_dataset.data.env_id mean?

CM-BF commented 9 months ago

Hi Yangyang,

Thank you for your questions.

  1. The id_val_dataset is the in-domain split in GOOD-Motif which shares similar distribution with the training set. In contrast, the val_dataset is the out-of-domain split in GOOD-Motif that consists of different distributions from the training set, i.e., different base graphs.
  2. The train_dataset.data.env_id indicates the environment labels of samples in the dataset, which serves the same purpose as the environment partitions in Invariant Risk Minimization (IRM). For more information, please refer to our paper.

Please let me know if you have any further questions! :smile:

Best regards, Shurui Gui