dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.18k stars 2.99k forks source link

[GraphBolt][CUDA] Incremental GPU graph cache into `gb.Dataloader`. #7475

Closed mfbalin closed 1 week ago

mfbalin commented 1 week ago

Description

Depends on #7470 and cpp changes were moved to #7483.

Checklist

Please feel free to remove inapplicable items for your PR.

Changes

dgl-bot commented 1 week ago

To trigger regression tests:

dgl-bot commented 1 week ago

Commit ID: 40d6e841c8a3cf9db266bff53c2382eb840c4be2

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 61294c44603fa891c6600b8199f45b6b295b10e1

Build ID: 2

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: c21735d89b6a6bb64fb6c220fc3ca84a0d620177

Build ID: 3

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: ee842748919c8907b26e48ef149ad2e7e49921c5

Build ID: 4

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 54a594aab565db1bef28fe5590f978666afec720

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 71f4a6c65777ae9e7c7e81cd4f0633ef81bdddc8

Build ID: 6

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 48a2bf267fba910dfd9680c63e2fd3dac0f1bf61

Build ID: 7

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 7bdd685cc8e4c7cb83e1aa8ad4b014d452b7702b

Build ID: 8

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 44c0aa923f18d074eda68f46c4686bb9a337c1ec

Build ID: 9

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot @Rhett-Ying CI still failing

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: 44c0aa923f18d074eda68f46c4686bb9a337c1ec

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: ef038125c93ce1c49d977a1bbbd705c75a54262b

Build ID: 11

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: d5435800f6b6ed5ae1b58e564768c1b8c1afaaa2

Build ID: 12

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: b89e81a8b672a6696861f700e2c22f9390973f74

Build ID: 13

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: ddb369b96b5e32a28a160f6745f3c89d80d02e57

Build ID: 14

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: c93ca18dc54a32aa4f0079707337a624235556ca

Build ID: 15

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

mfbalin commented 1 week ago
tests/distributed/test_partition.py::test_partition[None-True-1-4-metis] Converting to homogeneous graph takes 0.001s, peak mem: 2.438 GB
Convert a graph into a bidirected graph: 0.000 seconds, peak memory: 2.438 GB
Construct multi-constraint weights: 0.000 seconds, peak memory: 2.438 GB
Fatal Python error: Segmentation fault

Thread 0x00007f30deedf700 (most recent call first):
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 324 in wait
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 607 in wait
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f310bbe5800 (most recent call first):
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7475@2/python/dgl/partition.py", line 385 in metis_partition_assignment
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7475@2/python/dgl/distributed/partition.py", line 1001 in partition_graph
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7475@2/tests/distributed/test_partition.py", line 354 in check_partition
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7475@2/tests/distributed/test_partition.py", line 532 in test_partition
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 1627 in runtest
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 241 in <lambda>
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 240 in call_and_report
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 135 in runtestprotocol
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 339 in _main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 285 in wrap_session
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 178 in main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 206 in console_main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pytest/__main__.py", line 7 in <module>
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 86 in _run_code
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 196 in _run_module_as_main
dgl-bot commented 1 week ago

Commit ID: 0d63781ba5ff33b3af3eb15d5a06a8bc004a4629

Build ID: 16

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link