Open Fedomer opened 3 days ago
Hello @Fedomer,
Sionna uses Mitsuba for its ray tracing capabilities, which itself uses OptiX under the hood.
For OptiX to be able to be loaded, the Docker container needs to enable its support. I am not a Docker expert, but I think that enabling the graphics
driver capabilities should help: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#driver-capabilities
Hello @merlinND ,
thank's you I did it. I've created my container using the tutorial:
podman container create --name Sionna --device nvidia.com/gpu=all -it -p 8888:8888 --privileged=true --env NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility localhost/sionna:latest
NB: podman use the flags of docker and works fine for rapids images.
Glad it worked!
Hello @merlinND , I've done it but it did't work! I'm still investigating . I will try on a different hardware machine with different OS (Ubuntu 20.04, now I use RedHat enterprise 9.4 with podman)
the "No CUDA device found; " appears when I do : import sionna
could you please run this inside the docker container and give us the result?
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Hi @gmarcusm thanks,
# python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-10-07 16:48:56.624726: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:48:56.624791: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:48:56.626089: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:48:56.632935: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
also with import sionna:
`# python3
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import sionna 2024-10-07 16:59:29.563043: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:59:29.563161: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:59:29.564489: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:59:29.571596: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. No CUDA device found; using CPU as fallback.`
it seems that Tensorflow is not GPU enabled! but it's the official build with the dockerfile provided.
Upgraded news
Docker container seems load fine sionna package (with cuda) in a computer with Ubuntu 20.04LTS and Nvidia A5000 card with driver:
| NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 12.3 |
but have that strange issue in a GPU rack server with dual A100 GPU powered by RedHat enterprise 9.4 and podman as container engine.
Driver in RH9.4 are:
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
Other container with more recent tensorflow, Rapids works fine.
Still investigating......
First use and just at the first line [1]: GPU Configuration and Imports in the tutorial _Sionna_Ray_TracingIntroduction was not found. No CUDA device found; using CPU as fallback.
but !nvidia-smi print:
I use a container with the docker image. Other Rapids docker images works fine. drivers pb????