Cannot reproduce the results for large-scale datasets

claying commented 6 months ago

Dear authors,

Thank you for your interesting work! I tried to reproduce your results for the pokec dataset by running bash pokec.sh, but failed. Please find below the training log:

Namespace(dataset='pokec', data_dir='./data/', device=0, seed=42, cpu=False, local_epochs=2000, global_epochs=0, batch_size=550000, runs=1, metric='acc', method='poly', hidden_channels=256, local_layers=7, global_layers=2, num_heads=1, beta=0.9, pre_ln=False, post_bn=True, local_attn=False, lr=0.0005, weight_decay=0.0, in_dropout=0.0, dropout=0.2, global_dropout=0.2, display_step=1, eval_step=9, eval_epoch=1000, save_model=False, model_dir='./model/', save_result=False)
pokec
dataset pokec | num nodes 1632803 | num edge 30622564 | num node feats 65 | num classes 2
MODEL: Polynormer(
  (h_lins): ModuleList(
    (0): Linear(in_features=65, out_features=256, bias=True)
    (1-6): 6 x Linear(in_features=256, out_features=256, bias=True)
  )
  (local_convs): ModuleList(
    (0): GCNConv(65, 256)
    (1-6): 6 x GCNConv(256, 256)
  )
  (lins): ModuleList(
    (0): Linear(in_features=65, out_features=256, bias=True)
    (1-6): 6 x Linear(in_features=256, out_features=256, bias=True)
  )
  (lns): ModuleList(
    (0-6): 7 x LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  )
  (post_bns): ModuleList(
    (0-6): 7 x BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (lin_in): Linear(in_features=65, out_features=256, bias=True)
  (ln): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  (global_attn): GlobalAttn(
    (h_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (k_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (v_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (lns): ModuleList(
      (0-1): 2 x LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    )
    (lin_out): Linear(in_features=256, out_features=256, bias=True)
  )
  (pred_local): Linear(in_features=256, out_features=2, bias=True)
  (pred_global): Linear(in_features=256, out_features=2, bias=True)
)
Epoch: 1008, Loss: 0.6232, Train: 63.80%, Valid: 62.59%, Test: 62.47%, Best Valid: 62.59%, Best Test: 62.47%
Epoch: 1017, Loss: 0.6226, Train: 63.85%, Valid: 62.57%, Test: 62.42%, Best Valid: 62.59%, Best Test: 62.47%
Epoch: 1026, Loss: 0.6225, Train: 63.88%, Valid: 62.53%, Test: 62.42%, Best Valid: 62.59%, Best Test: 62.47%
Epoch: 1035, Loss: 0.6225, Train: 63.97%, Valid: 62.62%, Test: 62.44%, Best Valid: 62.62%, Best Test: 62.44%
Epoch: 1044, Loss: 0.6220, Train: 64.06%, Valid: 62.62%, Test: 62.56%, Best Valid: 62.62%, Best Test: 62.56%
Epoch: 1053, Loss: 0.6219, Train: 64.05%, Valid: 62.57%, Test: 62.48%, Best Valid: 62.62%, Best Test: 62.56%
Epoch: 1062, Loss: 0.6218, Train: 63.94%, Valid: 62.56%, Test: 62.42%, Best Valid: 62.62%, Best Test: 62.56%
Epoch: 1071, Loss: 0.6216, Train: 64.15%, Valid: 62.66%, Test: 62.54%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1080, Loss: 0.6215, Train: 64.09%, Valid: 62.56%, Test: 62.46%, Best Valid: 62.66%, Best Test: 62.54%
...skipping...
Epoch: 1944, Loss: 0.6005, Train: 67.60%, Valid: 61.70%, Test: 61.63%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1953, Loss: 0.5999, Train: 67.62%, Valid: 61.76%, Test: 61.63%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1962, Loss: 0.5999, Train: 67.57%, Valid: 61.61%, Test: 61.57%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1971, Loss: 0.5997, Train: 67.69%, Valid: 61.75%, Test: 61.75%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1980, Loss: 0.5996, Train: 67.77%, Valid: 61.78%, Test: 61.73%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1989, Loss: 0.5990, Train: 67.72%, Valid: 61.60%, Test: 61.59%, Best Valid: 62.66%, Best Test: 62.54%
Epoch: 1998, Loss: 0.5989, Train: 68.00%, Valid: 61.66%, Test: 61.60%, Best Valid: 62.66%, Best Test: 62.54%
Run 01:
Highest Train: 68.00
Highest Valid: 62.66
Highest Test: 62.56
Chosen epoch: 7
Final Train: 64.15
Final Test: 62.54

Do you have any idea?

Chenhui1016 commented 6 months ago

Hi,

Thanks for your interest! Every script in our repo has been checked and should reproduce our results. I just tried bash pokec.sh and it could achieve 86.06% after 1287 epochs. Our training log is shown below:

Namespace(dataset='pokec', data_dir='./data/', device=0, seed=42, cpu=False, local_epochs=2000, global_epochs=0, batch_size=550000, runs=1, metric='acc', method='poly', hidden_channels=256, local_layers=7, global_layers=2, num_heads=1, beta=0.9, pre_ln=False, post_bn=True, local_attn=False, lr=0.0005, weight_decay=0.0, in_dropout=0.0, dropout=0.2, global_dropout=0.2, display_step=1, eval_step=9, eval_epoch=1000, save_model=False, model_dir='./model/', save_result=False)
pokec
dataset pokec | num nodes 1632803 | num edge 30622564 | num node feats 65 | num classes 2
MODEL: Polynormer(
  (h_lins): ModuleList(
    (0): Linear(in_features=65, out_features=256, bias=True)
    (1-6): 6 x Linear(in_features=256, out_features=256, bias=True)
  )
  (local_convs): ModuleList(
    (0): GCNConv(65, 256)
    (1-6): 6 x GCNConv(256, 256)
  )
  (lins): ModuleList(
    (0): Linear(in_features=65, out_features=256, bias=True)
    (1-6): 6 x Linear(in_features=256, out_features=256, bias=True)
  )
  (lns): ModuleList(
    (0-6): 7 x LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  )
  (post_bns): ModuleList(
    (0-6): 7 x BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (lin_in): Linear(in_features=65, out_features=256, bias=True)
  (ln): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  (global_attn): GlobalAttn(                                                                                                                                                                                                      [0/1955]
    (h_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (k_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (v_lins): ModuleList(
      (0-1): 2 x Linear(in_features=256, out_features=256, bias=True)
    )
    (lns): ModuleList(
      (0-1): 2 x LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    )
    (lin_out): Linear(in_features=256, out_features=256, bias=True)
  )
  (pred_local): Linear(in_features=256, out_features=2, bias=True)
  (pred_global): Linear(in_features=256, out_features=2, bias=True)
)
Epoch: 1008, Loss: 0.4173, Train: 84.55%, Valid: 84.16%, Test: 84.22%, Best Valid: 84.16%, Best Test: 84.22%
Epoch: 1017, Loss: 0.4160, Train: 85.87%, Valid: 85.40%, Test: 85.44%, Best Valid: 85.40%, Best Test: 85.44%
Epoch: 1026, Loss: 0.4164, Train: 86.19%, Valid: 85.77%, Test: 85.79%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1035, Loss: 0.4159, Train: 85.12%, Valid: 84.77%, Test: 84.76%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1044, Loss: 0.4152, Train: 85.74%, Valid: 85.35%, Test: 85.36%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1053, Loss: 0.4155, Train: 84.24%, Valid: 83.78%, Test: 83.77%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1062, Loss: 0.4154, Train: 85.99%, Valid: 85.52%, Test: 85.58%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1071, Loss: 0.4155, Train: 85.85%, Valid: 85.41%, Test: 85.40%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1080, Loss: 0.4148, Train: 85.74%, Valid: 85.29%, Test: 85.37%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1089, Loss: 0.4152, Train: 84.59%, Valid: 84.15%, Test: 84.08%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1098, Loss: 0.4140, Train: 85.62%, Valid: 85.13%, Test: 85.10%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1107, Loss: 0.4138, Train: 85.80%, Valid: 85.33%, Test: 85.31%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1116, Loss: 0.4138, Train: 85.09%, Valid: 84.60%, Test: 84.57%, Best Valid: 85.77%, Best Test: 85.79%
Epoch: 1125, Loss: 0.4139, Train: 86.39%, Valid: 85.92%, Test: 85.96%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1134, Loss: 0.4131, Train: 85.99%, Valid: 85.55%, Test: 85.55%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1143, Loss: 0.4133, Train: 85.77%, Valid: 85.29%, Test: 85.26%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1152, Loss: 0.4135, Train: 85.32%, Valid: 84.77%, Test: 84.75%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1161, Loss: 0.4130, Train: 85.93%, Valid: 85.42%, Test: 85.42%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1170, Loss: 0.4133, Train: 86.28%, Valid: 85.77%, Test: 85.77%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1179, Loss: 0.4119, Train: 85.91%, Valid: 85.38%, Test: 85.37%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1188, Loss: 0.4125, Train: 82.09%, Valid: 81.59%, Test: 81.58%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1197, Loss: 0.4123, Train: 86.12%, Valid: 85.59%, Test: 85.58%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1206, Loss: 0.4116, Train: 86.13%, Valid: 85.55%, Test: 85.59%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1215, Loss: 0.4117, Train: 86.16%, Valid: 85.64%, Test: 85.59%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1224, Loss: 0.4113, Train: 86.21%, Valid: 85.64%, Test: 85.65%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1233, Loss: 0.4108, Train: 85.44%, Valid: 84.87%, Test: 84.85%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1242, Loss: 0.4104, Train: 86.04%, Valid: 85.48%, Test: 85.44%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1251, Loss: 0.4109, Train: 85.82%, Valid: 85.24%, Test: 85.23%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1260, Loss: 0.4118, Train: 86.04%, Valid: 85.45%, Test: 85.42%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1269, Loss: 0.4108, Train: 85.72%, Valid: 85.12%, Test: 85.11%, Best Valid: 85.92%, Best Test: 85.96%
Epoch: 1278, Loss: 0.4106, Train: 86.60%, Valid: 85.95%, Test: 85.95%, Best Valid: 85.95%, Best Test: 85.95%
Epoch: 1287, Loss: 0.4106, Train: 86.68%, Valid: 86.00%, Test: 86.06%, Best Valid: 86.00%, Best Test: 86.06%

According to your training log, it seems the model is converging slowly and has not reached full convergence even after 2000 epochs, which never happens from our side. This is definitely something we need to investigate. Can you reproduce your results by trying a different random seed? Btw, can you reproduce the results on ogbn-products?

claying commented 6 months ago

Hi @Chenhui1016

Thank you for your fast response! I still obtained the same accuracy with a different random seed. And I failed to reproduce the results on ogbn-products (only obtained a test accuracy of 61.47). However, I managed to reproduce the results on smaller datasets like roman-empire, amazon-ratings, etc. In addition, I noticed that your global_epochs was set to 0 in pokec.sh rather than 500 as listed in your paper.

The major difference in my setup is that I used an H100 GPU (80G) rather than A6000. I don't know if this would make such a huge gap in performance. Could you test your code on an H100 or A100 (probably with a smaller batch size)?

Chenhui1016 commented 6 months ago

The global_epochs shouldn't be the root cause. I believe different GPUs should not make such a big difference either. Have you created a new conda environment and installed all required packages (and their specified version) following our instructions? Can you please let me know your torch_geometric version?

claying commented 6 months ago

There was some compatibility issue with CUDA 11.7 on H100. Thus, I used Pytorch with CUDA 11.8. Otherwise, I used the same version of pyg (2.3) as in your README.

claying commented 6 months ago

I created a new environment and reinstalled everything. Now I obtain the same results as yours. I will check the difference between my old and new environment, and get back to you if I figure out the reason. Thanks for your help!

cornell-zhang / Polynormer

Cannot reproduce the results for large-scale datasets #1