dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.18k stars 2.99k forks source link

[GraphBolt][CUDA] Compute capability check refactor. #7485

Closed mfbalin closed 1 week ago

mfbalin commented 1 week ago

Description

Utilize existing cub function to get compute capability in a cached manner using https://nvidia.github.io/cccl/cub/api/function_namespacecub_1a75d6eade310f187ff42915ae82e963d0.html

Checklist

Please feel free to remove inapplicable items for your PR.

Changes

dgl-bot commented 1 week ago

To trigger regression tests:

dgl-bot commented 1 week ago

Commit ID: 17fabf754cab91cef9b8cfc8f83f99738340369d

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 2

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 3

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 4

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 5

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

mfbalin commented 1 week ago

@Rhett-Ying

tests/distributed/test_mp_dataloader.py::test_edge_dataloader_heterograph[False-None-True-0] Fatal Python error: Segmentation fault

Thread 0x00007f90b2e5e700 (most recent call first):
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 324 in wait
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 607 in wait
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f90e0361800 (most recent call first):
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/codecs.py", line 322 in decode
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7485/python/dgl/partition.py", line 271 in get_peak_mem
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7485/python/dgl/distributed/partition.py", line 921 in partition_graph
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7485/tests/distributed/test_mp_dataloader.py", line 770 in check_dataloader
  File "/home/ubuntu/jenkins/workspace/dgl_PR-7485/tests/distributed/test_mp_dataloader.py", line 989 in test_edge_dataloader_heterograph
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 1627 in runtest
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 241 in <lambda>
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 240 in call_and_report
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 135 in runtestprotocol
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 339 in _main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 285 in wrap_session
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 178 in main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 206 in console_main
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pytest/__main__.py", line 7 in <module>
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 86 in _run_code
  File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 196 in _run_module_as_main
dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 6

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 7

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@dgl-bot

dgl-bot commented 1 week ago

Commit ID: d4804486c8ddb78b69ee61ff746c6288037c3e9b

Build ID: 8

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link