Augmentation-Free Self-Supervised Learning on Graphs

The official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted at AAAI 2022.

Overview

Inspired by the recent success of self-supervised methods applied on images, self-supervised learning on graph structured data has seen rapid growth especially centered on augmentation-based contrastive methods. However, we argue that without carefully designed augmentation techniques, augmentations on graphs may behave arbitrarily in that the underlying semantics of graphs can drastically change. As a consequence, the performance of existing augmentation-based methods is highly dependent on the choice of augmentation scheme, i.e., hyperparameters associated with augmentations. In this paper, we propose a novel augmentation-free self-supervised learning framework for graphs, named AFGRL. Specifically, we generate an alternative view of a graph by discovering nodes that share the local structural information and the global semantics with the graph. Extensive experiments towards various node-level tasks, i.e., node classification, clustering, and similarity search on various real-world datasets demonstrate the superiority of AFGRL.

Augmentations on images keep the underlying semantics, whereas augmentations on graphs may unexpectedly change the semantics.

Requirements

Python version: 3.7.10
Pytorch version: 1.8.1
torch-geometric version: 1.7.0
faiss: 1.7.0

Hyperparameters

Following Options can be passed to main.py

--dataset: Name of the dataset. Supported names are: wikics, cs, computers, photo, and physics. Default is wikics.
usage example :--dataset wikics

--task: Name of the task. Supported names are: node, clustering, similarity. Default is node.
usage example :--task node

--layers: The number of units of each layer of the GNN. Default is [256]
usage example :--layers 256

--pred_hid: The number of hidden units of predictor. Default is [512]
usage example :--pred_hid 512

--topk: The number of neighbors for nearest neighborhood search. Default is 4.
usage example :--topk 4

--num_centroids: The number of centroids for K-means Clustering . Default is 100.
usage example :--num_centroids 100

--num_kmeans: The number of iterations for K-means Clustering . Default is 5.
usage example :--num_kmeans 5

How to Run

You can run the model with following options

To run node classification (reproduce Table 2 in paper)
```
sh run_node_classification.sh
```
To run node clustering (reproduce Table 3 in paper)
```
sh run_node_clustering.sh
```
To run similarity search (reproduce Table 4 in paper)
```
sh run_similarity_search.sh
```

or you can run the file with above mentioned hyperparameters

python main.py --embedder AFGRL --dataset wikics --task node --layers [1024] --pred_hid 2048 --lr 0.001 --topk 8

Cite (Bibtex)

If you find AFGRL useful in your research, please cite the following paper:
- Lee, Namkyeong, Junseok Lee, and Chanyoung Park. "Augmentation-Free Self-Supervised Learning on Graphs." AAAI 2022.
- Bibtex
```
@article{lee2021augmentation,
title={Augmentation-Free Self-Supervised Learning on Graphs},
author={Lee, Namkyeong and Lee, Junseok and Park, Chanyoung},
booktitle={AAAI},
year={2022}
}
```