divelab / GOOD

GOOD: A Graph Out-of-Distribution Benchmark [NeurIPS 2022 Datasets and Benchmarks]
https://good.readthedocs.io/
GNU General Public License v3.0
187 stars 19 forks source link
deep-learning distribution-shift graph-neural-networks graph-ood invariant-learning out-of-distribution-generalization pytorch pytorch-geometric

:sparkles: GOOD: A Graph Out-of-Distribution Benchmark :sparkles:

Documentation Status Last Commit License codecov CircleCI GOOD stars Contributing

Documentation | NeurIPS 2022 Paper | Preprint

This repo maintains and updates GOOD benchmark which is accepted by NeurIPS 2022 Datasets and Benchmarks Track. :smile:

News

Roadmap

Tutorial

* denotes the method is reproduced by its authors.

Datasets

We are planning to include more graph out-of-distribution datasets for your convenience.

Features

Leaderboard [Feb 20th updates]

Table of contents

Overview

GOOD (Graph OOD) is a graph out-of-distribution (OOD) algorithm benchmarking library depending on PyTorch and PyG to make develop and benchmark OOD algorithms easily.

Currently, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits. We provide performance results on 12 commonly used baseline methods (ERM, IRM, VREx, GroupDRO, Coral, DANN, MixupForGraph, DIR, GSAT, CIGA, EERM,SRGNN) including 6 graph specific methods with 10 random runs.

The GOOD dataset summaries are shown in the following figure.

Dataset

Why GOOD?

Whether you are an experienced researcher of graph out-of-distribution problems or a first-time learner of graph deep learning, here are several reasons to use GOOD as your Graph OOD research, study, and development toolkit.

Installation

Conda dependencies

GOOD depends on PyTorch (>=1.6.0), PyG (>=2.0), and RDKit (>=2020.09.5). For more details: conda environment

Note that we currently test on PyTorch (==1.10.1), PyG (==2.0.4), RDKit (==2020.09.5); thus we strongly encourage to install these versions.

Warning: Please install with cuda >= 11.3 to avoid unexpected cuda errors.

Recommended installation examples:

Pip

Installation for Project usages (recommended)

git clone https://github.com/divelab/GOOD.git && cd GOOD
pip install -e .

Quick Tutorial

Run an algorithm

It is a good beginning to make it work directly. Here, we provide the CLI goodtg (GOOD to go) to access the main function located at GOOD.kernel.main:goodtg. Choosing a config file in configs/GOOD_configs, we can start a task:

goodtg --config_path GOOD_configs/GOODCMNIST/color/concept/DANN.yaml

Hyperparameter sweeping

To perform automatic hyperparameter sweeping and job launching, you can use goodtl (GOOD to launch):

goodtl --sweep_root sweep_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3

Sweeping result collection and config update.

To harvest all fruits you have grown (collect all results you have run), please use goodtl with a special launcher HarvestLauncher:

goodtl --sweep_root sweep_configs --final_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT

(Experimental function.)

The output numpy array:

Final runs

It is sometimes not practical to run 10 rounds for hyperparameter sweeping, especially when the searching space is huge. Therefore, we can generally run hyperparameter sweeping for 2~3 rounds, then perform all rounds after selecting the best hyperparameters. Now, remove the --sweep_root, set --config_root to your updated best config saving location, and set the --allow_rounds.

goodtl --config_root final_configs --launcher MultiLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_devices 0 1 2 3 --allow_rounds 1 2 3 4 5 6 7 8 9 10

Note that the results are valid only after 3+ rounds experiments in this benchmark.

Final result collection

goodtl --config_root final_configs --launcher HarvestLauncher --allow_datasets GOODMotif --allow_domains basis --allow_shifts covariate --allow_algs GSAT --allow_rounds 1 2 3 4 5 6 7 8 9 10

Output: Markdown format table. (This table is also saved in the file: /result_table.md).

You can customize your own launcher at GOOD/kernel/launchers/.

Add a new algorithm

Please follow this documentation to add a new algorithm.

Any contributions are welcomed! Please refer to contributing for adding your algorithm into GOOD.

//: # ()

//: # ()

//: # ()

//: # ()

Leaderboard

The initial leaderboard results are listed in the paper. And the validation of these results is described here.

Leaderboard 1.1.0 with updated datasets will be available here.

Citing GOOD

If you find this repository helpful, please cite our paper.

@inproceedings{
gui2022good,
title={{GOOD}: A Graph Out-of-Distribution Benchmark},
author={Shurui Gui and Xiner Li and Limei Wang and Shuiwang Ji},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=8hHg-zs_p-h}
}

License

The GOOD datasets are under MIT license. The GOOD code are under GPLv3 license.

Discussion

Please submit new issues or start a new discussion for any technical or other questions.

Contact

Please feel free to contact Shurui Gui, Xiner Li, or Shuiwang Ji!

Acknowledgements

We thank Jundong Li and Jing Ma for insightful discussions. This work was supported in part by National Science Foundation grants IIS-1955189, IIS-1908198, and IIS-1908220.