ChandlerBang / GCond

[ICLR'22] [KDD'22] [IJCAI'24] Implementation of "Graph Condensation for Graph Neural Networks"
https://www.cs.emory.edu/~wjin30/files/GCond.pdf
121 stars 16 forks source link

Poor performance of GCond on Amazon dataset #18

Open atpugludrim opened 2 months ago

atpugludrim commented 2 months ago

Hi, I am running GCond on the Amazon dataset. The setting is as follows: I have added appropriate data loading, train-val-test split creation script in your code. And now I run it to generate say 73 nodes out of ~7000 in Photos dataset, or ~130 nodes out of ~13000 nodes of Computers dataset (around 1% of the size). Then, I use these datasets to train a simple GCN (2-layered, with ReLU between the two layers, no batchnorm, no dropout, optimized with standard parameters of Adam, the standard stuff). This model fails to perform well. This model only predicts one single class. The performance is ~24% for Photos and ~37% for Computers, while the full train set performance is ~90% for both. And random gets to about ~66-72% for both.

So, is there a quick fix, like finding the optimum inner-outer loop parameters, or optimizing the SGC parameters?

P.S.: I noticed that the loss (gradient matching loss) printed out by your code at the 0th epoch is too high for these datasets (~97) which doesn't go down a lot, while for datasets like Cora, Citeseer, etc. it starts with ~1-3 and falls down below 1 very quickly.

ChandlerBang commented 2 months ago

Hi, thanks for the question. I haven't tested amazon-photo or amazon-computers before. According to my experience, optimizingdis=mse tends to be more stabler compared to dis=ours; so you may try optimizing MSE loss first but typically with longer training epochs, e.g., python train_gcond_transduct.py --dataset cora --nlayers=2 --lr_feat=1e-3 --lr_adj=1e-3 --r=0.5 --sgc=0 --dis=mse --gpu_id=2 --one_step=1 --epochs=5000.