This is a Pytorch code Implementation of the paper Exploring Correlations of Self-Supervised Tasks for Graphs, which is accepted by the ICML 2024. We quantitatively characterize the correlations between different graph self-supervision tasks and obtain more effective graph self-supervised representations with our proposed GraphTCM.
We used the following packages under Python 3.10
.
pytorch 2.1.1
torch-geometric 2.4.0
matplotlib 3.5.0
pandas 2.1.3
Existing graph self-supervised methods can be categorized into four primary: feature-based (FB), structure-based (SB), auxiliary property-based (APB) and contrast-based (CB). To comprehensively understand the complex relationships in graph self-supervised tasks, we have chosen two representative methods from each category for detailed analysis.
We provide the representations obtained from training using these eight self-supervised methods across various datasets, located in the directory emb/
.
Given two self-supervised tasks $t_1,t_2\in \mathcal{T}$, a graph $\mathcal{G}:(\mathbf{A},\mathbf{X})$, we define the correlation value $\text{Cor}(t_1,t_2)$ as:
We provide the correlation values for various self-supervised tasks across different datasets in train_GraphTCM.py
.
Please run train_GraphTCM.py
to train a GraphTCM model on the specific dataset.
usage: train_GraphTCM.py [-h] [--hidden_dim HIDDEN_DIM] [--pooling POOLING] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--valid_rate VALID_RATE] [--dataset DATASET]
PyTorch implementation for building the correlation.
options:
-h, --help show this help message and exit
--hidden_dim HIDDEN_DIM hidden dimension
--pooling POOLING pooling type
--device_num DEVICE_NUM device number
--epoch_num EPOCH_NUM epoch number
--lr LR learning rate
--seed SEED random seed
--valid_rate VALID_RATE validation rate
--dataset DATASET dataset
After training a GraphTCM model, please run train_emb.py
to obtain more effective self-supervised representations. To facilitate further experiments, we also provide the trained representations based on GraphTCM in the emb/
directory, all named GraphTCM.pkl
.
usage: train_emb.py [-h] [--hidden_dim HIDDEN_DIM] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--dataset DATASET] [--path PATH] [--target TARGET] [--train_method TRAIN_METHOD]
PyTorch implementation for training the representations.
options:
-h, --help show this help message and exit
--hidden_dim HIDDEN_DIM hidden dimension
--device_num DEVICE_NUM device number
--epoch_num EPOCH_NUM epoch number
--lr LR learning rate
--seed SEED random seed
--dataset DATASET dataset
--path PATH path for the trained GraphTCM model
--target TARGET training target (ones or zeros)
--train_method TRAIN_METHOD training method
We have provided scripts with hyper-parameter settings to reproduce the experimental results presented in our paper. Please run run.sh
under downstream/
to obtain the downstream results across various datasets.
cd downstream/
sh run.sh
You can cite our paper by following bibtex.
@inproceedings{Fang2024ExploringCO,
title={Exploring Correlations of Self-supervised Tasks for Graphs},
author={Taoran Fang and Wei Zhou and Yifei Sun and Kaiqiao Han and Lvbin Ma and Yang Yang},
booktitle={International Conference on Machine Learning},
year={2024}
}