Closed plexoos closed 3 months ago
For CUDA and NVIDIA driver compatibility see https://docs.nvidia.com/deploy/cuda-compatibility/
Downgraded NVIDIA driver on npps0
$ nvidia-smi
Tue Jun 18 12:10:48 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off| 00000000:16:00.0 Off | Off |
| 0% 35C P8 12W / 450W| 108MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off| 00000000:34:00.0 Off | Off |
| 0% 27C P8 10W / 450W| 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
How to reproduce:
./esi-shell -v 1.0.0-beta.13 "opticks-full-prepare && opticks-t"
...
=== opticks-setup-geant4- : sourcing /opt/spack/opt/spack/linux-ubuntu22.04-sapphirerapids/gcc-11.4.0/geant4-11.1.2-z4zrvgbct5pf2uhyrxf7xlo5mjalfiwf/./bin/geant4.sh
=== om-test-one : okconf /esi/opticks/okconf /usr/local/opticks/build/okconf
Wed Jun 19 00:33:44 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:44 UTC 2024
=== om-test-one : sysrap /esi/opticks/sysrap /usr/local/opticks/build/sysrap
Wed Jun 19 00:33:44 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:44 UTC 2024
=== om-test-one : ana /esi/opticks/ana /usr/local/opticks/build/ana
Wed Jun 19 00:33:44 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:44 UTC 2024
=== om-test-one : analytic /esi/opticks/analytic /usr/local/opticks/build/analytic
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : bin /esi/opticks/bin /usr/local/opticks/build/bin
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : CSG /esi/opticks/CSG /usr/local/opticks/build/CSG
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : qudarap /esi/opticks/qudarap /usr/local/opticks/build/qudarap
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : gdxml /esi/opticks/gdxml /usr/local/opticks/build/gdxml
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : u4 /esi/opticks/u4 /usr/local/opticks/build/u4
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : CSGOptiX /esi/opticks/CSGOptiX /usr/local/opticks/build/CSGOptiX
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
=== om-test-one : g4cx /esi/opticks/g4cx /usr/local/opticks/build/g4cx
Wed Jun 19 00:33:45 UTC 2024
ctest --interactive-debug-mode 0 --output-on-failure
Wed Jun 19 00:33:45 UTC 2024
...
The tests appear to be skipped when running on lambda or onyx
Another more targeted test with just the cmake
command:
dsmirnov@lambda1:~/test$ ./esi-shell -v 1.0.0-beta.13 "cmake --help"
==> Using esi-shell image: ghcr.io/bnlnpps/esi-shell:1.0.0-beta.13
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
dsmirnov@lambda1:~/test$
No output. And interactively we get:
dsmirnov@lambda1:~/test$ ./esi-shell -v 1.0.0-beta.13
==> Using esi-shell image: ghcr.io/bnlnpps/esi-shell:1.0.0-beta.13
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
root@e1c3e924fafd:~# cmake
Illegal instruction (core dumped)
Need to see the effect of setting Spack target to generic microarchitectures. See https://spack.readthedocs.io/en/latest/build_settings.html
Fixed by #90
Our test nodes belong to Maxwell (sm_52), Volta (sm_70), and Ada (sm_89) generations
Currently, the official images are build on a system with NVIDIA GPUs:
npps0
However, running the images fails on the following test systems:
lambda
onyx