Closed Rhett-Ying closed 2 months ago
To trigger regression tests:
@dgl-bot run [instance-type] [which tests] [compare-with-branch]
;
For example: @dgl-bot run g4dn.4xlarge all dmlc/master
or @dgl-bot run c5.9xlarge kernel,api dmlc/master
@dgl-bot
We can add -DCUDA_ARCH_NAME=Auto
to reduce compilation times and reduce memory use in this file:
https://github.com/dmlc/dgl/blob/9fde953d4bdb2a2d5ba4e878f31b032d46162920/tests/scripts/build_dgl.sh#L21
Hopefully, it will compile only for the GPU architecture present in the CI. If Auto
somehow does not work, we can consider using Turing
instead of Auto
as the CI has a NVIDIA T4 GPU.
I guess the Auto is already the default flag. However, autodetection seems to be failing in the CI.
-- Running GPU architecture autodetection
nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used
CMake Warning at cmake/modules/CUDA.cmake:84 (message):
Running GPU detection script with nvcc failed:
Call Stack (most recent call first):
cmake/modules/CUDA.cmake:161 (dgl_detect_installed_gpus)
cmake/modules/CUDA.cmake:235 (dgl_select_nvcc_arch_flags)
CMakeLists.txt:276 (dgl_config_cuda)
CMake Warning at cmake/modules/CUDA.cmake:89 (message):
Automatic GPU detection failed. Building for all known architectures
(50;60;70;75;80;86;89;90).
Call Stack (most recent call first):
cmake/modules/CUDA.cmake:161 (dgl_detect_installed_gpus)
cmake/modules/CUDA.cmake:235 (dgl_select_nvcc_arch_flags)
CMakeLists.txt:276 (dgl_config_cuda)
Can we pass TORCH_CUDA_ARCH_LIST to dgl_sparse and tensoradapter the same way we do for graphbolt? The most recent errors may be due to dgl_sparse. We can potentially refactor the logic to set TORCH_CUDA_ARCH_LIST from graphbolt so that it can be reused in dgl_sparse and tensoradapter.
Description
Checklist
Please feel free to remove inapplicable items for your PR.
Changes