Low accuracy for OGBn-Arxiv dataset

Yash685 commented 1 month ago

❓ Questions and Help

Hello Team, I am conducting experiments on DGL with the ogbn-arxiv dataset. However, my test accuracy is 0.5547, which is significantly lower than the results reported on the OGB leaderboard. Could you please provide guidance on achieving better performance?

DGL 1.2

Experiment details:

Dataset : ogbn-arxiv
Sampler : Neighbor Sampler
Fanout : [10,10,10]
Batch_size : [1000]

Hardware details:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0-63
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7313 16-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           2
    Stepping:            1
    Frequency boost:     enabled
    CPU(s) scaling MHz:  45%
    CPU max MHz:         3729.4919
    CPU min MHz:         1500.0000
    BogoMIPS:            6000.04
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   1 MiB (32 instances)
  L1i:                   1 MiB (32 instances)
  L2:                    16 MiB (32 instances)
  L3:                    256 MiB (8 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-15,32-47
  NUMA node1 CPU(s):     16-31,48-63

Code used to measure accuracy arxiv_accuracy.txt

Output Snapshot

Screenshot from 2024-05-27 16-51-47

BarclayII commented 1 month ago

Were you running on one of our examples? https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ogbn-arxiv

Yash685 commented 1 month ago

Thank you for the response Since my analysis is based GraphSAGE model , I reused the code present here https://github.com/dmlc/dgl/blob/1.1.x/examples/pytorch/graphsage/node_classification.py and passed dataset as ogbn-arxiv

BarclayII commented 1 month ago

They may have different hyperparameters. Could you try the OGBN-Arxiv example and see if there is any problem?

UtkrishtP commented 3 weeks ago

@BarclayII We added reverse edges to the graph and the accuracy has improved. Why is adding reverse edge not an option enabled by default for ogbn-arxiv and papers?

Rhett-Ying commented 3 weeks ago

We should not change the original dataset in default. Adding reverse edges is just user's option instead of dataset.

UtkrishtP commented 3 weeks ago

@Rhett-Ying Thanks for reverting back. Can you suggest us on how to achieve the targeted leaderboard accuracy in the original dataset? We have tried using default and various other hyper-params (aka batch_size, fanouts, dropouts, learning rate) on Graphsage training, without any improvements in accuracy.

dmlc / dgl