Need more details for reproducing the results of FDcompCN dataset in paper.

Split-GNN / SplitGNN

18 stars 9 forks source link

Need more details for reproducing the results of FDcompCN dataset in paper. #3

Closed ambarion closed 11 months ago

ambarion commented 11 months ago

Thanks for you excellent work. I meet some problems when follow your hyperparameter on FDCompCN. The metrices connot achieve its level in paper. Can you share some training details about the results in paper? Here is the 'comp.yaml'

dataset: 'comp'
seed: 30
epoch: 1000
early_stop: 100
lr: 0.001
weight_decay: 0.00005
cuda: 0
log: True
data_path: './SplitGNN-master/data/'
result_path: './SplitGNN-master/result/'
# model parameters
gamma: 0.8
C: 3
K: 2
intra_dim: 8
n_class: 2
dropout: 0.1

ambarion commented 11 months ago

I also notice that the FDCompCN was constructed by sampling (between 2020 and 2023) 5,317 publicly listed Chinese companies. The nodes (listed Chinese companies) which have customers-suppliers relationships in 2020-2022 (no 2023 data for now) in CSMAR database are reletively less than your set in the paper (e.g. only 1991). Are the customers-suppliers data from the CSMAR tables like SC_NetworkRelationsIndex, SC_TopFiveSaleInfo or SC_TopFivePurchaseInfo.xlsx?

And I am also little confused about the source of investment relationships.

Could you give some details about process on CSMAR database? Looking forward for your help.

blackboxo commented 11 months ago

Due to the small size of the dataset, the algorithm's performance may be unstable. These are the average results of each algorithm running 10 times, which you can refer to.

Metric	AUC	GMean	F1-macro	Recall
XGBoost	0.5906±0.0166	0.5895±0.0168	0.4793±0.0218	0.5830±0.0337
MLP	0.5924±0.0058	0.4957±0.0041	0.5035±0.0060	0.3290±0.0077
GCN	0.5724±0.0091	0.5361±0.0217	0.4725±0.0252	0.4643±0.0773
GAT	0.6041±0.0022	0.5483±0.0265	0.5004±0.0142	0.4433±0.0796
GPRGNN	0.6411±0.0204	0.4674±0.0992	0.4385±0.1457	0.5325±0.2990
FAGCN	0.6213±0.0083	0.5241±0.0431	0.4469±0.0809	0.5799±0.2404
H2GCN	0.5451±0.0344	0.4590±0.0127	0.5077±0.0177	0.2679±0.0108
CARE-GNN	0.6639±0.0108	0.5894±0.0144	0.5072±0.0395	0.5357±0.0898
RioGNN	0.6233±0.0167	0.5365±0.0308	0.4850±0.0649	0.4876±0.1780
H2-FDetector	0.5823±0.0515	0.5093±0.0491	0.4351±0.1008	0.5474±0.1947
BWGNN	0.5936±0.0172	0.4923±0.0435	0.4546±0.0808	0.4902±0.2473
SplitGNN	0.6724±0.0121	0.6273±0.0126	0.5147±0.0285	0.6179±0.0811

blackboxo commented 11 months ago

Thank you for your attention. The data are generated from tables including SC_TopFivePurchaseInfo, SC_TopFivesaleInfo and SC_OutInvestmentInfo. Please check customers-suppliers relationships in 2018-2021. Thank you for pointing out the mistake.

ambarion commented 11 months ago

Great! Thanks!