Closed Sbozzolo closed 6 months ago
The buildkite pipeline had several problems. I fixed them and now most jobs are twice as fast.
The GPU unit test seems to be the only one adversely affected. @sriharshakandala, do you want to have a look at this?
https://buildkite.com/clima/rrtmgp-ci/builds/582#018d9acf-9053-433b-8a76-a0593b20f8d9
Changes overall look good to me, except a couple items in the project toml
I consildated the environments to only have perf
(because that's the only one that is being run on buildkite)
@charleskawczynski do you have any idea what could be the reason behind this increase in time https://buildkite.com/clima/rrtmgp-ci/builds/592#018d9ed1-b8ac-4121-8118-2d3930baa764 compared to main?
It happens only on buildkite, @sriharshakandala ran the code on the cluster and found the same speed as main
@Sbozzolo : Please plan on including https://github.com/CliMA/RRTMGP.jl/pull/448 in this release.
I spent 3 more hours on this and I narrowed down the problem the CUDA updates. I can reproduce on the cluster on the P100 when I use CUDA 5.2, but it still fast when using CUDA 5.1.
Fast:
julia> CUDA.versioninfo()
CUDA runtime 12.2, local installation
CUDA driver 12.3
NVIDIA driver 535.54.3, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 20.0.0
- NVML: 12.0.0+535.54.3
Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
- CUDA_Runtime_Discovery: 0.2.3
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.2
- CUDA_Runtime_jll.local: true
1 device:
0: Tesla P100-PCIE-16GB (sm_60, 15.893 GiB / 16.000 GiB available)
Slow:
CUDA runtime 12.2, local installation
CUDA driver 12.3
NVIDIA driver 535.54.3, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 20.0.0
- NVML: 12.0.0+535.54.3
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
- CUDA_Runtime_Discovery: 0.2.3
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.2
- CUDA_Runtime_jll.local: true
1 device:
0: Tesla P100-PCIE-16GB (sm_60, 15.893 GiB / 16.000 GiB available)
Only changes:
[79e6a3ab] ↑ Adapt v3.7.2 ⇒ v4.0.1
[052768ef] ↑ CUDA v5.1.2 ⇒ v5.2.0
[0c68f7d7] ↑ GPUArrays v9.1.0 ⇒ v10.0.2
[46192b85] ↑ GPUArraysCore v0.1.5 ⇒ v0.1.6
[76a88914] ↑ CUDA_Runtime_jll v0.10.1+0 ⇒ v0.11.1+0
I also checked that using the system and the artifact runtime produce the same results.
@sriharshakandala do you want to take this on and investigate further?
I'm going to rebase this PR, cc @Sbozzolo
GPU tests on the CI seems to be taking much longer (https://buildkite.com/clima/rrtmgp-ci/builds/573#018d9080-12e0-43b2-bdfb-2e03fe406ff7) compared to the latest
main
(https://buildkite.com/clima/rrtmgp-ci/builds/570#018d8f6a-d60b-4422-81e4-f1c938044cef)