LuckyTiger123 / GraphTCM

The code Implementation of the paper “Exploring Correlations of Self-Supervised Tasks for Graphs”.
4 stars 1 forks source link

GraphTCM

This is a Pytorch code Implementation of the paper Exploring Correlations of Self-Supervised Tasks for Graphs, which is accepted by the ICML 2024. We quantitatively characterize the correlations between different graph self-supervision tasks and obtain more effective graph self-supervised representations with our proposed GraphTCM.

Installation

We used the following packages under Python 3.10.

pytorch 2.1.1
torch-geometric 2.4.0
matplotlib 3.5.0
pandas 2.1.3

Base Tasks

Existing graph self-supervised methods can be categorized into four primary: feature-based (FB), structure-based (SB), auxiliary property-based (APB) and contrast-based (CB). To comprehensively understand the complex relationships in graph self-supervised tasks, we have chosen two representative methods from each category for detailed analysis.

We provide the representations obtained from training using these eight self-supervised methods across various datasets, located in the directory emb/.

Correlation Value

Given two self-supervised tasks $t_1,t_2\in \mathcal{T}$, a graph $\mathcal{G}:(\mathbf{A},\mathbf{X})$, we define the correlation value $\text{Cor}(t_1,t_2)$ as:

We provide the correlation values for various self-supervised tasks across different datasets in train_GraphTCM.py.

Training GraphTCM

Please run train_GraphTCM.py to train a GraphTCM model on the specific dataset.

usage: train_GraphTCM.py [-h] [--hidden_dim HIDDEN_DIM] [--pooling POOLING] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--valid_rate VALID_RATE] [--dataset DATASET]

PyTorch implementation for building the correlation.

options:
  -h, --help                        show this help message and exit
  --hidden_dim HIDDEN_DIM           hidden dimension
  --pooling POOLING                 pooling type
  --device_num DEVICE_NUM           device number
  --epoch_num EPOCH_NUM             epoch number
  --lr LR                           learning rate
  --seed SEED                       random seed
  --valid_rate VALID_RATE           validation rate
  --dataset DATASET                 dataset

Training Representations

After training a GraphTCM model, please run train_emb.py to obtain more effective self-supervised representations. To facilitate further experiments, we also provide the trained representations based on GraphTCM in the emb/ directory, all named GraphTCM.pkl.

usage: train_emb.py [-h] [--hidden_dim HIDDEN_DIM] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--dataset DATASET] [--path PATH] [--target TARGET] [--train_method TRAIN_METHOD]

PyTorch implementation for training the representations.

options:
  -h, --help                        show this help message and exit
  --hidden_dim HIDDEN_DIM           hidden dimension
  --device_num DEVICE_NUM               device number
  --epoch_num EPOCH_NUM                 epoch number
  --lr LR                           learning rate
  --seed SEED                       random seed
  --dataset DATASET                 dataset
  --path PATH                       path for the trained GraphTCM model
  --target TARGET                   training target (ones or zeros)
  --train_method TRAIN_METHOD           training method

Downstream Adaptations

We have provided scripts with hyper-parameter settings to reproduce the experimental results presented in our paper. Please run run.sh under downstream/ to obtain the downstream results across various datasets.

cd downstream/
sh run.sh

Citation

You can cite our paper by following bibtex.

@inproceedings{Fang2024ExploringCO,
  title={Exploring Correlations of Self-supervised Tasks for Graphs},
  author={Taoran Fang and Wei Zhou and Yifei Sun and Kaiqiao Han and Lvbin Ma and Yang Yang},
  booktitle={International Conference on Machine Learning},
  year={2024}
}