[GraphBolt] Add experimental `ItemSet/Dict4` and `ItemSampler4`

Skeleton003 commented 5 months ago

Description

benchmark:

num_ids: 24, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 5.26561164855957
New: 4.075196266174316

num_ids: 24, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.038467884063721
New: 5.061769247055054

num_ids: 24, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.08016300201416
New: 5.055493116378784

num_ids: 24, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.044290542602539
New: 5.022970676422119

num_ids: 24, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.418801546096802
New: 6.484843492507935

num_ids: 24, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.407760143280029
New: 7.527584791183472

num_ids: 24, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.492152690887451
New: 6.431138277053833

num_ids: 24, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.491805791854858
New: 7.4210569858551025

num_ids: 30, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.09150767326355
New: 5.011434316635132

num_ids: 30, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.040276288986206
New: 4.068592071533203

num_ids: 30, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 4.038927793502808
New: 4.038530349731445

num_ids: 30, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.019740343093872
New: 5.0285563468933105

num_ids: 30, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.428295612335205
New: 6.409729242324829

num_ids: 30, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.421130657196045
New: 7.533393383026123

num_ids: 30, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 7.41476035118103
New: 6.400209188461304

num_ids: 30, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.40072774887085
New: 6.447648048400879

num_ids: 32, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.057007789611816
New: 5.063795328140259

num_ids: 32, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 5.035150051116943
New: 5.006322145462036

num_ids: 32, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.089540958404541
New: 5.047980546951294

num_ids: 32, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 4.040552854537964
New: 5.0497941970825195

num_ids: 32, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.43973970413208
New: 7.493116855621338

num_ids: 32, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.553787469863892
New: 7.6020872592926025

num_ids: 32, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.490302085876465
New: 7.487463474273682

num_ids: 32, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 7.364883661270142
New: 7.4597368240356445

num_ids: 34, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 4.082199811935425
New: 4.053929328918457

num_ids: 34, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 4.063207149505615
New: 5.091043710708618

num_ids: 34, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 4.999620676040649
New: 5.112699031829834

num_ids: 34, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.024035930633545
New: 4.051522493362427

num_ids: 34, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.471214771270752
New: 6.554701328277588

num_ids: 34, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 6.449496269226074
New: 6.529990196228027

num_ids: 34, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 7.431456804275513
New: 7.41823673248291

num_ids: 34, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.479130506515503
New: 6.368876695632935

num_ids: 36, num_workers: 0, drop_last: False, drop_uneven_inputs: False
Old: 5.009013652801514
New: 5.050375461578369

num_ids: 36, num_workers: 0, drop_last: False, drop_uneven_inputs: True
Old: 4.0677573680877686
New: 4.125107288360596

num_ids: 36, num_workers: 0, drop_last: True, drop_uneven_inputs: False
Old: 5.023468971252441
New: 5.105181455612183

num_ids: 36, num_workers: 0, drop_last: True, drop_uneven_inputs: True
Old: 5.063021421432495
New: 5.089923143386841

num_ids: 36, num_workers: 2, drop_last: False, drop_uneven_inputs: False
Old: 7.424851179122925
New: 7.432251453399658

num_ids: 36, num_workers: 2, drop_last: False, drop_uneven_inputs: True
Old: 7.543227672576904
New: 7.601431131362915

num_ids: 36, num_workers: 2, drop_last: True, drop_uneven_inputs: False
Old: 6.457719326019287
New: 6.451065540313721

num_ids: 36, num_workers: 2, drop_last: True, drop_uneven_inputs: True
Old: 6.568817377090454
New: 6.491897344589233

Checklist

Please feel free to remove inapplicable items for your PR.

[ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[ ] I've leverage the tools to beautify the python and c++ code.
[ ] The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
[ ] All changes have test coverage
[ ] Code is well-documented
[ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[ ] Related issue is referred in this PR
[ ] If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot commented 5 months ago

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot commented 5 months ago

Commit ID: 78b9e9057614436e99443cb46ac0ace3243e37ca

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 63544180d27e1f6707b93597fa5d6c2bd465c6dc

Build ID: 2

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 5a5c786bcf9525eb0753ec6913ce1c7e270e0a30

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 22180a5053a84344aceea9926ea4e80b83ff7cfb

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: a2ca65173ae263f16b8b2182b782e040cd08c080

Build ID: 5

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 9d2e81a4480acdb79c59233c2efef9893e857d96

Build ID: 6

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Skeleton003 commented 5 months ago

@Rhett-Ying Benchmark shows that the variation on performance is acceptable. I'am trying to find out a way to enable all replicas to obtain a random seed from the main process instead of letting user manually set it, but this is yet another topic. For now, I think we can merge this PR first.

Rhett-Ying commented 5 months ago

num_ids: 36, num_workers: 2

num_ids is the total number of ItemSet or ItemSetDict? If yes, it's too small and not persuasive.

Skeleton003 commented 5 months ago

benchmark on /dgl/examples/multigpu/graphbolt/node_classification.py:

ogbn-products

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3
Training with 4 gpus.
The dataset is already preprocessed.
Training...
48it [00:02, 16.06it/s]
Validating...
10it [00:00, 21.67it/s]
Epoch 00000 | Average Loss 2.3267 | Accuracy 0.7917 | Time 3.5637
48it [00:02, 21.37it/s]
Validating...
10it [00:00, 24.19it/s]
Epoch 00001 | Average Loss 0.9559 | Accuracy 0.8437 | Time 2.7528
48it [00:02, 21.33it/s]
Validating...
10it [00:00, 24.10it/s]
Epoch 00002 | Average Loss 0.7238 | Accuracy 0.8602 | Time 2.7597
48it [00:02, 21.33it/s]
Validating...
10it [00:00, 24.51it/s]
Epoch 00003 | Average Loss 0.6163 | Accuracy 0.8706 | Time 2.7502
48it [00:02, 21.45it/s]
Validating...
10it [00:00, 24.45it/s]
Epoch 00004 | Average Loss 0.5578 | Accuracy 0.8762 | Time 2.7404
48it [00:02, 20.19it/s]
Validating...
10it [00:00, 24.57it/s]
Epoch 00005 | Average Loss 0.5176 | Accuracy 0.8819 | Time 2.8776
48it [00:02, 21.50it/s]
Validating...
10it [00:00, 24.13it/s]
Epoch 00006 | Average Loss 0.4883 | Accuracy 0.8855 | Time 2.7396
48it [00:02, 21.42it/s]
Validating...
10it [00:00, 24.41it/s]
Epoch 00007 | Average Loss 0.4667 | Accuracy 0.8881 | Time 2.7437
48it [00:02, 21.31it/s]
Validating...
10it [00:00, 24.19it/s]
Epoch 00008 | Average Loss 0.4477 | Accuracy 0.8889 | Time 2.7596
48it [00:02, 21.46it/s]
Validating...
10it [00:00, 24.29it/s]
Epoch 00009 | Average Loss 0.4343 | Accuracy 0.8920 | Time 2.7416
Testing...
541it [00:19, 27.95it/s]
Test Accuracy 0.7348

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3
Training with 4 gpus.
The dataset is already preprocessed.
Training...
48it [00:03, 15.84it/s]
Validating...
10it [00:00, 22.02it/s]
Epoch 00000 | Average Loss 2.3048 | Accuracy 0.7777 | Time 3.5975
48it [00:02, 21.28it/s]
Validating...
10it [00:00, 25.05it/s]
Epoch 00001 | Average Loss 0.9804 | Accuracy 0.8388 | Time 2.7448
48it [00:02, 21.31it/s]
Validating...
10it [00:00, 24.98it/s]
Epoch 00002 | Average Loss 0.7427 | Accuracy 0.8587 | Time 2.7464
48it [00:02, 21.43it/s]
Validating...
10it [00:00, 25.03it/s]
Epoch 00003 | Average Loss 0.6308 | Accuracy 0.8696 | Time 2.7333
48it [00:02, 21.40it/s]
Validating...
10it [00:00, 25.19it/s]
Epoch 00004 | Average Loss 0.5623 | Accuracy 0.8785 | Time 2.7332
48it [00:02, 20.29it/s]
Validating...
10it [00:00, 24.69it/s]
Epoch 00005 | Average Loss 0.5228 | Accuracy 0.8815 | Time 2.8657
48it [00:02, 21.37it/s]
Validating...
10it [00:00, 24.89it/s]
Epoch 00006 | Average Loss 0.4937 | Accuracy 0.8850 | Time 2.7418
48it [00:02, 21.41it/s]
Validating...
10it [00:00, 25.01it/s]
Epoch 00007 | Average Loss 0.4696 | Accuracy 0.8879 | Time 2.7378
48it [00:02, 21.36it/s]
Validating...
10it [00:00, 25.03it/s]
Epoch 00008 | Average Loss 0.4537 | Accuracy 0.8909 | Time 2.7409
48it [00:02, 21.40it/s]
Validating...
10it [00:00, 24.88it/s]
Epoch 00009 | Average Loss 0.4388 | Accuracy 0.8932 | Time 2.7407
Testing...
541it [00:19, 27.96it/s]
Test Accuracy 0.7393

ogbn-arxiv

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-arxiv
Training with 4 gpus.
The dataset is already preprocessed.
Training...
22it [00:01, 21.57it/s]
Validating...
8it [00:00, 52.40it/s]
Epoch 00000 | Average Loss 3.2543 | Accuracy 0.3002 | Time 1.2109
22it [00:00, 54.33it/s]
Validating...
8it [00:00, 70.41it/s]
Epoch 00001 | Average Loss 2.5287 | Accuracy 0.4404 | Time 0.5230
22it [00:00, 59.90it/s]
Validating...
8it [00:00, 71.66it/s]
Epoch 00002 | Average Loss 2.1985 | Accuracy 0.5054 | Time 0.4818
22it [00:00, 54.64it/s]
Validating...
8it [00:00, 86.39it/s]
Epoch 00003 | Average Loss 1.9795 | Accuracy 0.5349 | Time 0.4978
22it [00:00, 57.34it/s]
Validating...
8it [00:00, 78.11it/s]
Epoch 00004 | Average Loss 1.8419 | Accuracy 0.5529 | Time 0.4944
22it [00:00, 42.99it/s]
Validating...
8it [00:00, 73.39it/s]
Epoch 00005 | Average Loss 1.7533 | Accuracy 0.5649 | Time 0.6252
22it [00:00, 56.13it/s]
Validating...
8it [00:00, 76.69it/s]
Epoch 00006 | Average Loss 1.6852 | Accuracy 0.5713 | Time 0.5014
22it [00:00, 52.51it/s]
Validating...
8it [00:00, 79.52it/s]
Epoch 00007 | Average Loss 1.6405 | Accuracy 0.5766 | Time 0.5221
22it [00:00, 59.19it/s]
Validating...
8it [00:00, 67.85it/s]
Epoch 00008 | Average Loss 1.6055 | Accuracy 0.5814 | Time 0.4923
22it [00:00, 60.42it/s]
Validating...
8it [00:00, 71.80it/s]
Epoch 00009 | Average Loss 1.5681 | Accuracy 0.5878 | Time 0.4783
Testing...
12it [00:00, 82.86it/s]
Test Accuracy 0.5271

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-arxiv
Training with 4 gpus.
The dataset is already preprocessed.
Training...
22it [00:01, 18.31it/s]
Validating...
8it [00:00, 54.37it/s]
Epoch 00000 | Average Loss 3.1735 | Accuracy 0.2941 | Time 1.3790
22it [00:00, 58.89it/s]
Validating...
8it [00:00, 78.07it/s]
Epoch 00001 | Average Loss 2.4895 | Accuracy 0.4520 | Time 0.4908
22it [00:00, 56.94it/s]
Validating...
8it [00:00, 73.67it/s]
Epoch 00002 | Average Loss 2.1515 | Accuracy 0.5135 | Time 0.5007
22it [00:00, 54.02it/s]
Validating...
8it [00:00, 69.11it/s]
Epoch 00003 | Average Loss 1.9372 | Accuracy 0.5381 | Time 0.5256
22it [00:00, 56.69it/s]
Validating...
8it [00:00, 70.72it/s]
Epoch 00004 | Average Loss 1.8119 | Accuracy 0.5560 | Time 0.5067
22it [00:00, 39.94it/s]
Validating...
8it [00:00, 74.97it/s]
Epoch 00005 | Average Loss 1.7279 | Accuracy 0.5639 | Time 0.6646
22it [00:00, 56.77it/s]
Validating...
8it [00:00, 79.99it/s]
Epoch 00006 | Average Loss 1.6723 | Accuracy 0.5734 | Time 0.4928
22it [00:00, 60.43it/s]
Validating...
8it [00:00, 71.34it/s]
Epoch 00007 | Average Loss 1.6253 | Accuracy 0.5817 | Time 0.4789
22it [00:00, 58.53it/s]
Validating...
8it [00:00, 91.09it/s]
Epoch 00008 | Average Loss 1.5881 | Accuracy 0.5844 | Time 0.4690
22it [00:00, 56.57it/s]
Validating...
8it [00:00, 77.58it/s]
Epoch 00009 | Average Loss 1.5577 | Accuracy 0.5878 | Time 0.4972
Testing...
12it [00:00, 88.09it/s]
Test Accuracy 0.5279

ogbn-papers100M

Old:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-papers100M
Training with 4 gpus.
The dataset is already preprocessed.
Training...
294it [00:22, 13.15it/s]
Validating...
31it [00:02, 14.12it/s]
Epoch 00000 | Average Loss 1.9491 | Accuracy 0.5924 | Time 24.7810
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00001 | Average Loss 1.3033 | Accuracy 0.6245 | Time 23.8770
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00002 | Average Loss 1.2215 | Accuracy 0.6469 | Time 23.8830
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.56it/s]
Epoch 00003 | Average Loss 1.1796 | Accuracy 0.6448 | Time 23.8804
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00004 | Average Loss 1.1523 | Accuracy 0.6533 | Time 23.8787
294it [00:21, 13.58it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00005 | Average Loss 1.1338 | Accuracy 0.6464 | Time 23.9888
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.55it/s]
Epoch 00006 | Average Loss 1.1200 | Accuracy 0.6503 | Time 23.8843
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.52it/s]
Epoch 00007 | Average Loss 1.1080 | Accuracy 0.6569 | Time 23.8870
294it [00:21, 13.64it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00008 | Average Loss 1.0979 | Accuracy 0.6615 | Time 23.8950
294it [00:21, 13.65it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00009 | Average Loss 1.0894 | Accuracy 0.6603 | Time 23.8899
Testing...
53it [00:03, 14.50it/s]
Test Accuracy 0.6318

New:

$ python /home/ubuntu/dgl/examples/multigpu/graphbolt/node_classification.py --gpu 0,1,2,3 --dataset ogbn-papers100M
Training with 4 gpus.
The dataset is already preprocessed.
Training...
294it [00:21, 13.69it/s]
Validating...
31it [00:02, 14.19it/s]
Epoch 00000 | Average Loss 1.9418 | Accuracy 0.5957 | Time 23.8790
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.65it/s]
Epoch 00001 | Average Loss 1.3039 | Accuracy 0.6233 | Time 23.0518
294it [00:20, 14.19it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00002 | Average Loss 1.2206 | Accuracy 0.6458 | Time 23.0501
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.62it/s]
Epoch 00003 | Average Loss 1.1800 | Accuracy 0.6493 | Time 23.0555
294it [00:20, 14.17it/s]
Validating...
31it [00:02, 14.54it/s]
Epoch 00004 | Average Loss 1.1533 | Accuracy 0.6571 | Time 23.0787
294it [00:20, 14.11it/s]
Validating...
31it [00:02, 14.58it/s]
Epoch 00005 | Average Loss 1.1354 | Accuracy 0.6563 | Time 23.1551
294it [00:20, 14.19it/s]
Validating...
31it [00:02, 14.56it/s]
Epoch 00006 | Average Loss 1.1197 | Accuracy 0.6585 | Time 23.0504
294it [00:20, 14.18it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00007 | Average Loss 1.1088 | Accuracy 0.6571 | Time 23.0587
294it [00:20, 14.21it/s]
Validating...
31it [00:02, 14.53it/s]
Epoch 00008 | Average Loss 1.0991 | Accuracy 0.6616 | Time 23.0182
294it [00:20, 14.20it/s]
Validating...
31it [00:02, 14.57it/s]
Epoch 00009 | Average Loss 1.0909 | Accuracy 0.6632 | Time 23.0365
Testing...
53it [00:03, 14.53it/s]
Test Accuracy 0.6337

Skeleton003 commented 5 months ago

Tested on g4dn.metal.

dgl-bot commented 5 months ago

Commit ID: 0091ccae666bf1915f2022dfd420afd049186a5e

Build ID: 7

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 0b00f16c08a5242b4a592cf9565b21eb69e80eb0

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

Skeleton003 commented 5 months ago

@Rhett-Ying The issue of random seed has been resolved. What a relief that torch.distributed has convenient communicating APIs.

dgl-bot commented 5 months ago

Commit ID: 3ca84f39e44065cc93c484672d8639ddd152bf09

Build ID: 9

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 5 months ago

Commit ID: 08ac1ebbba8514a5eeea4ffcbeac85204335468d

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Skeleton003 commented 5 months ago

This POC proves to work well both on correctness and performance. Now it's time to finalize the code change.

Is it possible to update existing ItemSampler instead of creating a new class? Seems the major part is fixing the seed?

is it possible to split the change on ItemSampler and ItemSet/Dict to make the change as small as possible for quick review?

I'm afraid the change on ItemSet/Dict cannot be separated because the new ItemSampler takes it as input. We have to modify them simultaneously. For the sake of code review, I think we can devide this PR into 2. The first adds ItemSet/Dict4 but remain the old ItemSetDict unchanged, the second updates the existing ItemSampler and replaces the old ItemSetDict with the new. If this is what you envision, I can get started on it right away.

Rhett-Ying commented 5 months ago

This POC proves to work well both on correctness and performance. Now it's time to finalize the code change.

Is it possible to update existing ItemSampler instead of creating a new class? Seems the major part is fixing the seed?

is it possible to split the change on ItemSampler and ItemSet/Dict to make the change as small as possible for quick review?

I'm afraid the change on ItemSet/Dict cannot be separated because the new ItemSampler takes it as input. We have to modify them simultaneously. For the sake of code review, I think we can devide this PR into 2. The first adds ItemSet/Dict4 but remain the old ItemSetDict unchanged, the second updates the existing ItemSampler and replaces the old ItemSetDict with the new. If this is what you envision, I can get started on it right away.

Sounds good to me.

dmlc / dgl