Accepted by ACL 2024 Findings
CACL is a Community-Aware Heterogeneous Graph Contrastive Learning framework and we apply it to social media bot detection.
The implementation of CACL is mainly based on Pytorch and Pytorch Geometric API.
The steps to reimplement this work mainly contain:
Data preprocessing
predata_{dataset name}.py
.Pretrain
pretrain.py
.Train
train.py
to train the full model.super_parament_initial()
in utils.py
.Test
After training the model using train.py
, the test results will be shown and saved.
We have used three datasets throughout the entire work. You may need to contact the author to get access to some of them.
Other datasets may be helpful [web path]
Using dataset Cresci-15 and backbone HGT for example.
Once you preprocess the dataset using predata_cresci15.py
, then you can:
Pretrain the community-aware module
python3 pretrain.py --dataset cresci15 --basic_model HGT
Train the CACL with HGT
python3 pretrain.py --dataset cresci15 --basic_model HGT --max_error_times 5
Here are some key options of the hyperparameters
basic_model
: CACL framework support 3 backbones as the convolutional layer including GAT, SAGE, and HGT. num_layer
: the layer number of the convolutional network, we use 2 by default.lr_warmup_epochs
: during the initial training warm-up phase of the model, we increase the weight of the contrastive loss, aiming for the model to quickly find the optimal point.max_error_times
: we use the validation dataset for early stopping.cluster
: we implement several cluster method for community detection, we use randomwalk by default.The details of other optional hyperparameters can be found in the function super_parament_initial()
in utils.py
Please consider citing the following paper when using our code for your application.
@inproceedings{CACL2024,
author = {Sirry Chen and
Shuo Feng and
Songsong Liang and
Chen{-}Chen Zong and
Jing Li and
Piji Li},
editor = {Lun{-}Wei Ku and
Andre Martins and
Vivek Srikumar},
title = {{CACL:} Community-Aware Heterogeneous Graph Contrastive Learning for
Social Media Bot Detection},
booktitle = {Findings of the Association for Computational Linguistics, {ACL} 2024,
Bangkok, Thailand and virtual meeting, August 11-16, 2024},
pages = {10349--10360},
publisher = {Association for Computational Linguistics},
year = {2024},
url = {https://aclanthology.org/2024.findings-acl.617},
timestamp = {Tue, 27 Aug 2024 17:38:11 +0200},
biburl = {https://dblp.org/rec/conf/acl/ChenFLZLL24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}