A PyTorch + torch-geometric implementation of CGNP, as described in the paper: Shuheng Fang, Kangfei Zhao, Guanghua Li, Jeffery Xu Yu. [Community Search: A Meta-Learning Approach]
python 3.8
networkx
numpy
scipy
scikit-learn
torch 1.7.1
torch-geometric 1.7
Import the conda environment by running
conda env create -f cgnp.yaml
conda activate cgnp
pip install -r requirements.txt
Running Facebook
python main.py \
--data_set facebook \
--gnn_type GAT \
--meta_method cnp \
--data_dir [your/own/directory/containing/facebook/dataset (i.e. /home/shfang/data/facebook/facebook)] \
--pool_type avg
All the parameters with their default value are in get_args.py
name | type | description |
---|---|---|
num_layers | int | number of GNN layers |
gnn_type | string | type of GNN layer (GCN, GAT, SAGE) |
pool_type | string | type of the Commutative Operation (att, mean,sum) |
meta_method | string | type of different meta learning algorithm |
epochs | int | number of training epochs |
query_node_num | int | total number of query nodes for one task |
num_shots | int | number of query nodes of support set for one task |
subgraph_size | int | size of subgraph sampled in large graph |
data_set | string | dataset |
task_num | int | number of training tasks |
test_task_num | int | number of testing tasks |
label_mode | string | community mode: shared community or disjoint community |
num_pos | float | maximum proportion of positive instances for each query node |
num_neg | float | maximum proportion of negative instances for each query node |
learning_rate | Float | learning rate |
main.py # project extrance
util.py # generate tasks for different dataset, evaluation
QueryDataset.py # extract query from subgraphs and generate support/query set for meta algorithm
get_args.py # parameters settings
/meta
cnp.py # train and test for CGNP
/model
FwLayer.py
Model.py # model for CGNP
Layer.py # GNN layers and other layers
Loss.py
To use your own dataset, you can put the data graphs, feature, ground truth communities to '/data/disjoint(shared)/DATASET_NAME/graph_dgl.pkl', '/data/disjoint(shared)/DATASET_NAME/features.npy', '/data/disjoint(shared)/DATASET_NAME/label.pkl', respectively. And to preprocess, divide the graph into two parts, train.csv and test.csv, you can refer to citation_preprocess.py.
The format of input graph Arxiv/Cora/Citeseer and feature follows G-Meta and you can also get Arxiv dataset from it; The Reddit/Cora/Citeseer datasets are from torch-geometric; For DBLP, you can download it in [SNAP] (https://snap.stanford.edu/data/com-DBLP.html); For Facebook, find it in [SNAP] (https://snap.stanford.edu/data/ego-Facebook.html).
Open an issue or send email to shfang@se.cuhk.edu.hk if you have any problem
@inproceedings{fang2023community,
title={Community search: a meta-learning approach},
author={Fang, Shuheng and Zhao, Kangfei and Li, Guanghua and Yu, Jeffrey Xu},
booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
pages={2358--2371},
year={2023},
organization={IEEE}
}