QueuQ / CGLB

Other
50 stars 15 forks source link

CGLB

License: CC BY 4.0

Get Started | Dataset Usages | Pipeline Usages | Evaluation & Visualization Toolkit | Benchmarks | Acknowledgement

This is the official repository of Continual Graph Learning Benchmark (CGLB), which was published in the datasets and benchmarks track of NeurIPS 2022 (paper link). We will keep maintaining this repository to facilitate the development of continual graph learning, and we appreciate any comment on improving our CGLB!

 @inproceedings{zhang2022cglb,
  title={CGLB: Benchmark Tasks for Continual Graph Learning},
  author={Zhang, Xikun and Song, Dongjin and Tao, Dacheng},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2022}
}

Get Started

This repository contains our CGLB implemented for running on GPU devices. To run in the Windows system, please specify the argument --replace_illegal_char as True to avoid illegal filename characters (details are in Pipeline Usages). To run the code, the following packages are required to be installed:

For GCGL tasks, the following packages are also required:

For the baselines with tunable hyper-parameters, our framework provide a convenient way to do a grid search over candidate combinations of hyper-parameters. For example, the baseline GEM has two tunable hyper-parameters memory_strength and n_memories. If the candidate memory_strength and n_memories are [0.05,0.5,5] and [10,100,1000], respectively, then the command to validate every possible hyper-parameter combination over the validation and test the chosen optimal model on the testing set is as follows:

python train.py --dataset Arxiv-CL \
       --gem_args " 'memory_strength': [0.05,0.5,5]; 'n_memories': [10,100,1000] " \
       --method gem \
       --backbone GCN \
       --gpu 0 \
       --ILmode $IL \
       --inter-task-edges $inter \
       --epochs $n_epochs \

Some rules to note are: 1. The entire argument " 'memory_strength': [0.05,0.5,5]; 'n_memories': [10,100,1000] " requires double quotes '' around it. 2. Semicolon ; is required to separate different parameters. 3. Brackets are used to wrap the hyper-parameter candidates. 4. A comma is used to separate the hyper-parameter candidates. In the example above, all possible nine (three choices for memory_strength and three choices for n_memories) combinations will be used to set the hyper-parameters.

Since the graphs in N-CGL can be too large to be processed in one batch on most devices, the --minibatch argument could be specified to be True for training with the large graphs in mini-batches.

 python train.py --dataset Arxiv-CL \
        --method bare \
        --backbone GCN \
        --gpu 0 \
        --ILmode taskIL \
        --inter-task-edges False \
        --minibatch True \
        --batch_size 2000 \
        --sample_nbs True \
        --n_nbs_sample 10,25

In the above example, besides specifying the --minibatch, the size of each mini-batch is also specified through --batch_size. Moreover, some graphs are extremely dense and will run out the memory even with mini-batch training, which could be addressed through the neighborhood sampling specified via --sample_nbs. And the number of neighbors to sample for each hop is specified through --n_nbs_sample. There are also other customizable arguments, the full list of which can be found in train.py.

When running the code in the Windows system, the following error OSError: [Errno 22] Invalid argument may be triggered and could be avoided by specifying the argument --replace_illegal_char as True to replace the potential illegal characters with the underscore symbol _. For example,

 python train.py --dataset Arxiv-CL \
        --method bare \
        --backbone GCN \
        --gpu 0 \
        --ILmode taskIL \
        --inter-task-edges False \
        --minibatch False \
        --replace_illegal_char True 

Modifying the train-validation-test Splitting

The splitting can be simply specified via the arguments when running the experiments. In our implemented pipeline, the corresponding arguments are the validation and testing ratios. For example,

python train.py --dataset Arxiv-CL \
        --method bare \
        --backbone GCN \
        --gpu 0 \
        --ILmode taskIL \
        --inter-task-edges False \
        --minibatch False \
        --ratio_valid_test 0.4 0.4

The example above set the data ratio for validation and testing as 0.4 and 0.4, and the training ratio is automatically calculated as 0.2.

Implementing New Methods

New continual graph learning methods can also be easily implemented in our highly modularized pipelines. The newly implemented method should be contained in a Python script file under the directory CGLB/NCGL/Baselines. Suppose we are implementing a method named A, then an CGLB/NCGL/Baselines/A_model.py containing the implementation of the method should be created. The implementation is flexible as long as it satisfies the input format. Specifically, the Python class of the new method should contain an observe() function for model training on a single task, whose input includes the task configurations and the data. Details on the input format could be found in any xxx_model.py file under the directory CGLB/NCGL/Baselines.

G-CGL

Below is an example for running the 'Bare model' baseline with GCN backbone on the SIDER-tIL dataset under the task-IL scenario.

 python train.py --dataset $SIDER-tIL \
        --method $Bare \
        --basemodel $GCN \
        --gpu 0 \
        --clsIL False

Evaluation and Visualization Toolkit

We provide three protocols to evaluate the obtained results as follows. With out pipeline, the results are uniformly stored in the form of a performance matrix, which can be directly fed into our evaluation toolkit.

1. Visualization of the Performance Matrix

This is the most thorough evaluation of a continual learning model since it shows the performance change of each task along the learning process on the entire task sequence. Suppose an experiment result is stored via the path "result_path", the generation of the visualization could be obtained by the following code. Note that the path should be quoted in " " instead of ' ', since ' ' may exist in the file name of the experimental result. Some examples are provided in N-CGL under Class-IL.

 from CGLB.NCGL.visualize import show_performance_matrices
 show_performance_matrices("result_path")

2. Learning Curve

This shows the curve of the average performance (AP). It contains less information than the performance matrix but can demonstrate the learning dynamics in a more direct and compact way. Suppose an experiment result is stored via the path result_path, the learning curve could be obtained by the following code. Some examples are provided in N-CGL under Class-IL.

 from CGLB.NCGL.visualize import show_learning_curve
 show_learning_curve("result_path")

3. Final AP and Final AF

Final AP and AF refers to the AP and AF after learning the entire task sequence and is the most compact way to show the performance of a model. Suppose an experiment result is stored via the path result_path, the final AP and AF could be obtained by the following code.

 from CGLB.NCGL.visualize import shown_final_APAF
 shown_final_APAF("result_path")

The outputs with standard deviation are in LaTex form making it easy to copy and paste into a LaTex table.

Benchmarks

This section shows our currently obtained results from different baselines. This section will keeps being updated to show state-of-the-art results.

Dataset Statistics

The statistics of the NCGL datasets are shown below. CGLB

The statistics of the GCGL datasets are shown below. CGLB

N-CGL under Task-IL

The results on N-CGL under the task-IL setting without inter-task edges are shown below. CGLB The results on N-CGL under the task-IL setting with inter-task edges are shown below. CGLB

N-CGL under Class-IL

The results on N-CGL under the class-IL setting with inter-task edges are shown below. CGLB The learning dynamics under the class-IL setting is more meaningful in reflecting the forgetting behavior of the models, therefore, we also show the learning curves and the visualization of the performance matrices as a demonstration of our evaluation & visualization toolkit. The learning curve is obtained on all four datasets, and the performance matrices are visualized on the CoraFull-CL datasets with the largest number of tasks. CGLB CGLB

G-CGL under Task-IL&Class-IL

The results on G-CGL under both task-IL and class-IL are shown below. CGLB

Acknowledgement

The construction of CGLB also benefits from existing repositories on both continual learning and continual graph learning. Specifically, the construction of the pipeline for training the continual learning models learns from both GEM and TWP. The implementations of the implementations of EWC, GEM learn from GEM. The implementations of MAS, Lwf, TWP learn from MAS and TWP. The implementation of TWP is adapted from TWP. The construction of the datasets also benefits from several existing databases and libraries. The construction of the N-CGL datasets uses the datasets and tools from OGB and DGL. The construction of the G-CGL datasets uses the datasets and tools from DGL and DGL-Lifesci. We sincerely thank the authors of these works for sharing their code and helping developing the community.