NVlabs / sionna

Sionna: An Open-Source Library for Next-Generation Physical Layer Research
https://nvlabs.github.io/sionna
Other
748 stars 214 forks source link

No CUDA device found; using CPU as fallback. #609

Open Fedomer opened 3 days ago

Fedomer commented 3 days ago

First use and just at the first line [1]: GPU Configuration and Imports in the tutorial _Sionna_Ray_TracingIntroduction was not found. No CUDA device found; using CPU as fallback.

but !nvidia-smi print:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:01:00.0 Off |                    0 |
| N/A   42C    P0             37W /  250W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:81:00.0 Off |                    0 |
| N/A   51C    P0             47W /  250W |       1MiB /  40960MiB |      5%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

I use a container with the docker image. Other Rapids docker images works fine. drivers pb????

merlinND commented 1 day ago

Hello @Fedomer,

Sionna uses Mitsuba for its ray tracing capabilities, which itself uses OptiX under the hood. For OptiX to be able to be loaded, the Docker container needs to enable its support. I am not a Docker expert, but I think that enabling the graphics driver capabilities should help: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#driver-capabilities

Fedomer commented 1 day ago

Hello @merlinND , thank's you I did it. I've created my container using the tutorial: podman container create --name Sionna --device nvidia.com/gpu=all -it -p 8888:8888 --privileged=true --env NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility localhost/sionna:latest

NB: podman use the flags of docker and works fine for rapids images.

merlinND commented 22 hours ago

Glad it worked!

Fedomer commented 21 hours ago

Hello @merlinND , I've done it but it did't work! I'm still investigating . I will try on a different hardware machine with different OS (Ubuntu 20.04, now I use RedHat enterprise 9.4 with podman)

the "No CUDA device found; " appears when I do : import sionna

gmarcusm commented 19 hours ago

could you please run this inside the docker container and give us the result?

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Fedomer commented 18 hours ago

Hi @gmarcusm thanks, # python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-10-07 16:48:56.624726: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:48:56.624791: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:48:56.626089: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:48:56.632935: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

also with import sionna: `# python3
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import sionna 2024-10-07 16:59:29.563043: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:59:29.563161: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:59:29.564489: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:59:29.571596: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. No CUDA device found; using CPU as fallback.`

it seems that Tensorflow is not GPU enabled! but it's the official build with the dockerfile provided.

Fedomer commented 4 hours ago

Upgraded news

Docker container seems load fine sionna package (with cuda) in a computer with Ubuntu 20.04LTS and Nvidia A5000 card with driver: | NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 12.3 | but have that strange issue in a GPU rack server with dual A100 GPU powered by RedHat enterprise 9.4 and podman as container engine. Driver in RH9.4 are: | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |

Other container with more recent tensorflow, Rapids works fine.

Still investigating......