[GraphBolt] remove buffer in `ItemSampler.__iter__`

Skeleton003 commented 1 month ago

Description

Since global sharding is applied, buffer is no longer needed when iterating the ItemSampler.

Remove it for simplicity.

Benchmark:

`examples/sampling/graphbolt/node_classification.py`

Before:

$ python /home/ubuntu/dgl/examples/sampling/graphbolt/node_classification.py 
Training in pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training...
Training: 193it [00:08, 21.83it/s]
Evaluating: 39it [00:01, 24.40it/s]
Epoch 00000 | Loss 1.2657 | Accuracy 0.8605 | Time 8.8440
Training: 193it [00:08, 22.78it/s]
Evaluating: 39it [00:01, 24.42it/s]
Epoch 00001 | Loss 0.5886 | Accuracy 0.8753 | Time 8.4753
Training: 193it [00:08, 22.78it/s]
Evaluating: 39it [00:01, 24.41it/s]
Epoch 00002 | Loss 0.4930 | Accuracy 0.8853 | Time 8.4744
Training: 193it [00:08, 22.71it/s]
Evaluating: 39it [00:01, 24.42it/s]
Epoch 00003 | Loss 0.4465 | Accuracy 0.8902 | Time 8.5024
Training: 193it [00:08, 22.80it/s]
Evaluating: 39it [00:01, 24.42it/s]
Epoch 00004 | Loss 0.4220 | Accuracy 0.8917 | Time 8.4664
Training: 193it [00:08, 22.73it/s]
Evaluating: 39it [00:01, 24.39it/s]
Epoch 00005 | Loss 0.4106 | Accuracy 0.8948 | Time 8.4937
Training: 193it [00:08, 22.73it/s]
Evaluating: 39it [00:01, 24.42it/s]
Epoch 00006 | Loss 0.3896 | Accuracy 0.8987 | Time 8.4935
Training: 193it [00:08, 22.79it/s]
Evaluating: 39it [00:01, 24.43it/s]
Epoch 00007 | Loss 0.3754 | Accuracy 0.9005 | Time 8.4725
Training: 193it [00:08, 22.79it/s]
Evaluating: 39it [00:01, 24.40it/s]
Epoch 00008 | Loss 0.3663 | Accuracy 0.9033 | Time 8.4711
Training: 193it [00:08, 22.79it/s]
Evaluating: 39it [00:01, 24.41it/s]
Epoch 00009 | Loss 0.3616 | Accuracy 0.9030 | Time 8.4700
Testing...
598it [00:07, 79.14it/s]
598it [00:17, 34.50it/s]
598it [00:17, 34.14it/s]
Test accuracy 0.7673

After:

$ python /home/ubuntu/dgl/examples/sampling/graphbolt/node_classification.py 
Training in pinned-cuda mode.
Loading data...
The dataset is already preprocessed.
Training...
Training: 193it [00:09, 21.24it/s]
Evaluating: 39it [00:01, 24.46it/s]
Epoch 00000 | Loss nan | Accuracy 0.8554 | Time 9.0887
Training: 193it [00:08, 22.84it/s]
Evaluating: 39it [00:01, 24.47it/s]
Epoch 00001 | Loss nan | Accuracy 0.8743 | Time 8.4546
Training: 193it [00:08, 22.49it/s]
Evaluating: 39it [00:01, 24.47it/s]
Epoch 00002 | Loss nan | Accuracy 0.8832 | Time 8.5839
Training: 193it [00:08, 22.83it/s]
Evaluating: 39it [00:01, 24.48it/s]
Epoch 00003 | Loss nan | Accuracy 0.8845 | Time 8.4581
Training: 193it [00:08, 22.58it/s]
Evaluating: 39it [00:01, 24.44it/s]
Epoch 00004 | Loss nan | Accuracy 0.8941 | Time 8.5520
Training: 193it [00:08, 22.43it/s]
Evaluating: 39it [00:01, 24.48it/s]
Epoch 00005 | Loss nan | Accuracy 0.8952 | Time 8.6080
Training: 193it [00:08, 22.49it/s]
Evaluating: 39it [00:01, 24.49it/s]
Epoch 00006 | Loss nan | Accuracy 0.8911 | Time 8.5830
Training: 193it [00:08, 22.78it/s]
Evaluating: 39it [00:01, 24.48it/s]
Epoch 00007 | Loss nan | Accuracy 0.8951 | Time 8.4745
Training: 193it [00:08, 22.81it/s]
Evaluating: 39it [00:01, 24.48it/s]
Epoch 00008 | Loss nan | Accuracy 0.8991 | Time 8.4632
Training: 193it [00:08, 22.84it/s]
Evaluating: 39it [00:01, 24.48it/s]
Epoch 00009 | Loss nan | Accuracy 0.8917 | Time 8.4522
Testing...
598it [00:07, 76.75it/s]
598it [00:17, 34.47it/s]
598it [00:17, 34.14it/s]
Test accuracy 0.7523

`examples/sampling/graphbolt/rgcn/hetero_rgcn.py`

Before:

$ python /home/ubuntu/dgl/examples/sampling/graphbolt/rgcn/hetero_rgcn.py
The dataset is already preprocessed.
Loaded dataset: node_classification
node_num for rel_graph_embed: {'author': tensor(1134649, dtype=torch.int32), 'field_of_study': tensor(59965, dtype=torch.int32), 'institution': tensor(8740, dtype=torch.int32)}
Number of embedding parameters: 154029312
Number of model parameters: 337460
Start to train...
Training~Epoch 01: 615it [00:59, 10.30it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.50it/s]
Finish evaluating on validation set.
Epoch: 01, Loss: 2.3330, Valid accuracy: 47.38%, Time 59.7114
Training~Epoch 02: 615it [00:59, 10.40it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.61it/s]
Finish evaluating on validation set.
Epoch: 02, Loss: 1.5593, Valid accuracy: 47.69%, Time 59.1281
Training~Epoch 03: 615it [00:57, 10.73it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.64it/s]
Finish evaluating on validation set.
Epoch: 03, Loss: 1.1594, Valid accuracy: 47.37%, Time 57.2960
Testing...
Inference: 11it [00:00, 21.96it/s]
Test accuracy 46.0311

After:

$ python /home/ubuntu/dgl/examples/sampling/graphbolt/rgcn/hetero_rgcn.py
The dataset is already preprocessed.
Loaded dataset: node_classification
node_num for rel_graph_embed: {'author': tensor(1134649, dtype=torch.int32), 'field_of_study': tensor(59965, dtype=torch.int32), 'institution': tensor(8740, dtype=torch.int32)}
Number of embedding parameters: 154029312
Number of model parameters: 337460
Start to train...
Training~Epoch 01: 615it [00:59, 10.26it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.48it/s]
Finish evaluating on validation set.
Epoch: 01, Loss: 2.3207, Valid accuracy: 47.72%, Time 59.9646
Training~Epoch 02: 615it [00:59, 10.40it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.58it/s]
Finish evaluating on validation set.
Epoch: 02, Loss: 1.5420, Valid accuracy: 47.73%, Time 59.1323
Training~Epoch 03: 615it [00:56, 10.91it/s]
Evaluating the model on the validation set.
Inference: 16it [00:00, 21.54it/s]
Finish evaluating on validation set.
Epoch: 03, Loss: 1.1368, Valid accuracy: 46.45%, Time 56.3553
Testing...
Inference: 11it [00:00, 21.85it/s]
Test accuracy 45.5781

Checklist

Please feel free to remove inapplicable items for your PR.

[ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[ ] I've leverage the tools to beautify the python and c++ code.
[ ] The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
[ ] All changes have test coverage
[ ] Code is well-documented
[ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[ ] Related issue is referred in this PR
[ ] If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot commented 1 month ago

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot commented 1 month ago

Commit ID: 4cdc943bedf3b19558d178b84e6405b2a477e619

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dmlc / dgl

[GraphBolt] remove buffer in `ItemSampler.iter` #7430

Description

Benchmark:

`examples/sampling/graphbolt/node_classification.py`

`examples/sampling/graphbolt/rgcn/hetero_rgcn.py`

Checklist

Changes

dmlc / dgl

[GraphBolt] remove buffer in `ItemSampler.__iter__` #7430

Description

Benchmark:

examples/sampling/graphbolt/node_classification.py

examples/sampling/graphbolt/rgcn/hetero_rgcn.py

Checklist

Changes

[GraphBolt] remove buffer in `ItemSampler.iter` #7430

`examples/sampling/graphbolt/node_classification.py`

`examples/sampling/graphbolt/rgcn/hetero_rgcn.py`