Emory-Melody / GraphSlim

A Python library for graph reduction including condensation, coarsening, and sparsification.
14 stars 4 forks source link

GraphSlim

Documentation Downloads

Documentation | Benchmark Paper | Benchmark Scripts | Survey Paper | Paper Collection | Web Interface

Online Demo

Features

GraphSlim is a PyTorch library for graph reduction. It takes graph of PyG format as input and outputs a reduced graph preserving properties or performance of the original graph.

Guidance

Prepare Environments

Please choose from requirements_torch1+.txt (torch 1.\*) and requirements.txt (torch2.*) at your convenience. Please change the cuda version of torch, torch-geometric and torch-sparse in the requirements file according to your system configuration.

Install from pip

# choose one version from https://data.pyg.org/whl/ based on your environment
pip install torch_scatter torch_sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install graphslim

Examples

python examples/train_coreset.py
python examples/train_coarsen.py
python examples/train_gcond.py

See more examples in Benchmark Scripts.

Use As Project

cd graphslim
python train_all.py -xxx xx

Run python configs.py --help to get all command line options.

Options:
  -D, --dataset TEXT              [default: cora]
  -G, --gpu_id INTEGER            gpu id start from 0, -1 means cpu  [default:
                                  0]
  --setting [trans|ind]           transductive or inductive setting
  --split TEXT                    only support public split now, do not change
                                  it  [default: fixed]
  --run_reduction INTEGER         repeat times of reduction  [default: 3]
  --run_eval INTEGER              repeat times of final evaluations  [default:
                                  10]
  --run_inter_eval INTEGER        repeat times of intermediate evaluations
                                  [default: 5]
  --eval_interval INTEGER         [default: 100]
  -H, --hidden INTEGER            [default: 256]
  --eval_epochs, --ee INTEGER     [default: 300]
  --eval_model, --em [GCN|GAT|SGC|APPNP|Cheby|GraphSage|GAT|SGFormer]
                                  [default: GCN]
  --condense_model [GCN|GAT|SGC|APPNP|Cheby|GraphSage|GAT]
                                  [default: SGC]
  -E, --epochs INTEGER            number of reduction epochs  [default: 1000]
  --lr FLOAT                      [default: 0.01]
  --weight_decay, --wd INTEGER    [default: 0]
  --pre_norm BOOLEAN              pre-normalize features, forced true for
                                  arxiv, flickr and reddit  [default: True]
  --outer_loop INTEGER            [default: 10]
  --inner_loop INTEGER            [default: 1]
  -R, --reduction_rate FLOAT      -1 means use representative reduction rate;
                                  reduction rate of training set, defined as
                                  (number of nodes in small graph)/(number of
                                  nodes in original graph)  [default: -1.0]
  -S, --seed INTEGER              Random seed  [default: 1]
  --nlayers INTEGER               number of GNN layers of condensed model
                                  [default: 2]
  -V, --verbose
  --init [variation_neighborhoods|variation_edges|variation_cliques|heavy_edge|algebraic_JC|affinity_GS|kron|vng|clustering|averaging|cent_d|cent_p|kcenter|herding|random]
                                  features initialization methods
  -M, --method [variation_neighborhoods|variation_edges|variation_cliques|heavy_edge|algebraic_JC|affinity_GS|kron|vng|clustering|averaging|gcond|doscond|gcondx|doscondx|sfgc|msgc|disco|sgdd|gcsntk|geom|cent_d|cent_p|kcenter|herding|random]
                                  [default: kcenter]
  --activation [sigmoid|tanh|relu|linear|softplus|leakyrelu|relu6|elu]
                                  activation function when do NAS  [default:
                                  relu]
  -A, --attack [random_adj|metattack|random_feat]
                                  corruption method
  -P, --ptb_r FLOAT               perturbation rate for corruptions  [default:
                                  0.25]
  --aggpreprocess                 use aggregation for coreset methods
  --dis_metric TEXT               distance metric for all condensation
                                  methods,ours means metric used in GCond
                                  paper  [default: ours]
  --lr_adj FLOAT                  [default: 0.0001]
  --lr_feat FLOAT                 [default: 0.0001]
  --threshold INTEGER             sparsificaiton threshold before evaluation
                                  [default: 0]
  --dropout FLOAT                 [default: 0.0]
  --ntrans INTEGER                number of transformations in SGC and APPNP
                                  [default: 1]
  --with_bn
  --no_buff                       skip the buffer generation and use existing
                                  in geom,sfgc
  --batch_adj INTEGER             batch size for msgc  [default: 1]
  --alpha FLOAT                   for appnp  [default: 0.1]
  --mx_size INTEGER               for gcsntk methods, avoid SVD error
                                  [default: 100]
  --save_path, --sp TEXT          save path for synthetic graph  [default:
                                  ../checkpoints]
  -W, --eval_whole                if run on whole graph
  --help                          Show this message and exit.

Use As Package

from graphslim.dataset import *
from graphslim.evaluation import *
from graphslim.condensation import GCond
from graphslim.config import cli

args = cli(standalone_mode=False)
# customize args here
args.reduction_rate = 0.5
args.device = 'cuda:0'
# add more args.<main_args/dataset_args> here
graph = get_dataset('cora', args=args)
# To reproduce the benchmark, use our args and graph class
# To use your own args and graph format, please ensure the args and graph class has the required attributes
# create an agent of one reduction algorithm
# add more args.<agent_args> here
agent = GCond(setting='trans', data=graph, args=args)
# reduce the graph 
reduced_graph = agent.reduce(graph, verbose=True)
# create an evaluator
# add more args.<evaluator_args> here
evaluator = Evaluator(args)
# evaluate the reduced graph on a GNN model
res_mean, res_std = evaluator.evaluate(reduced_graph, model_type='GCN')

All parameters can be divided into

<main_args>: dataset, method, setting, reduction_rate, seed, aggpreprocess, eval_whole, run_reduction
<attack_args>: attack, ptb_r
<dataset_args>: pre_norm, save_path, split, threshold
<agent_args>: init, eval_interval, eval_epochs, eval_model, condense_model, epochs, lr, weight_decay, outer_loop, inner_loop, nlayers, method, activation, dropout, ntrans, with_bn, no_buff, batch_adj, alpha, mx_size, dis_metric, lr_adj, lr_feat
<evaluator_args>: final_eval_model, eval_epochs, lr, weight_decay

See more details in Documentation

Customization

Web Interface

Our web application is deployed online using streamlit. But it also can be initiated using:

cd interface
python -m streamlit run vis_graphslim.py

to activate the interface. Please satisfy the dependency in interface/requirements.txt.

TODO

Limitations

Acknowledgement

Some of the algorithms are referred to paper authors' implementations and other packages.

SCAL

Sparsification

GCOND

GCSNTK

SFGC

GEOM

DeepRobust