MIRALab-USTC / GraphAKD

The code of paper Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation. Huarui He, Jie Wang, Zhanqiu Zhang, Feng Wu. SIGKDD 2022.
MIT License
40 stars 3 forks source link

Cannot reproduce the KD results #3

Closed fengh16 closed 1 year ago

fengh16 commented 1 year ago

Dear authors,

Thank you for your excellent work. However, I have some problems reproducing your experimental results for the baseline KD [21]. The result of KD on Cora is 77.63% rather than 83.2%. I think that there should be some wrong settings, but I cannot figure it out. I would appreciate it if you can give some advice on how to reproduce the results of KD.

As far as I know, the traditional logit-based knowledge distillation method has an additional loss term. I added the following code in node-level/stu-gcn/train.py:

    elif args.role == 'KD':
            real_distill_temp = 10
            alpha = 0.5
            loss = label_loss * alpha + nn.KLDivLoss()(
                F.log_softmax(logits / real_distill_temp, dim=-1),
                F.softmax(tea_logits / real_distill_temp, dim=-1),
            ) * (1 - alpha)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

and did some other fundamental changes.

However, the result of Cora is only 0.7763.

Namespace(dataset='cora', dropout=0.5, gpu=0, lr=0.001, n_epochs=600, n_hidden=64, n_layers=1, weight_decay=0.0005, self_loop=True, seed=2031, role='KD', data_dir='../../datasets', kd_dir='../../distilled', d_critic=1, g_critic=1, n_runs=10)
Param count: 96633
Test accuracy on cora: 77.50%

Runned 10 times
Val Accs: [0.766, 0.788, 0.786, 0.776, 0.752, 0.77, 0.78, 0.778, 0.774, 0.772]
Test Accs: [0.771, 0.787, 0.781, 0.77, 0.766, 0.784, 0.778, 0.767, 0.784, 0.775]
Average val accuracy: 0.7742 ± 0.00981631295344643
Average test accuracy on cora: 0.7763 ± 0.007211795892841123
hhr114 commented 1 year ago

Hi, Thanks for your attention. Try the following code:

elif args.role == 'KD':
    label_loss = loss_fcn(logits[train_mask], labels[train_mask])
    alpha = 0.7
    loss = (1 - alpha) * label_loss + alpha * kd_ce_loss(logits, tea_logits, temperature=1.0)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

where the function kd_ce_loss is defined as

def kd_ce_loss(logits_S, logits_T, temperature=1.0):
    beta_logits_T = logits_T / temperature
    beta_logits_S = logits_S / temperature
    p_T = F.softmax(beta_logits_T, dim=-1)
    loss = -(p_T * F.log_softmax(beta_logits_S, dim=-1)).sum(dim=-1).mean()
    return loss * (temperature ** 2)

Then, I get following results.

Runned 10 times
Val Accs: [0.804, 0.81, 0.814, 0.792, 0.792, 0.81, 0.812, 0.804, 0.802, 0.806]
Test Accs: [0.815, 0.812, 0.819, 0.811, 0.816, 0.822, 0.826, 0.821, 0.815, 0.806]
Average val accuracy: 0.8046 ± 0.00726911273815449
Average test accuracy on cora: 0.8163 ± 0.005586591089385334

The results may be slightly different on different devices.

fengh16 commented 1 year ago

Thank you for your reply!