I am trying to get my new ThinkPad with "NVIDIA RTX 4000 Ada 12 GB" graphics card going.
No matter what "cuda-driver(12.4)+cudnn+jax+jaxlib" combination I try, the best results are either a)"No GPU/TPU found, falling back to CPU." or b)"failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error"
Run:
runfile('/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes/PI_DeepONet_Stokes-Copy1', wdir='/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes')
2024-03-19 11:48:27.682846: I external/xla/xla/service/service.cc:168] XLA service 0x8dd95c0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2024-03-19 11:48:27.682867: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 11:48:27.689135: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:218] TfrtCpuClient created.
2024-03-19 11:48:29.450971: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 11:48:29.450988: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 11:48:29.450991: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 11:48:29.451052: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 11:48:29.451064: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Run:
2024-03-19 12:10:31.130411: I external/xla/xla/service/service.cc:168] XLA service 0x6a1d490 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2024-03-19 12:10:31.130427: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 12:10:31.134477: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:433] TfrtCpuClient created.
2024-03-19 12:10:50.428065: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 12:10:50.428083: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 12:10:50.428086: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 12:10:50.428143: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 12:10:50.428156: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
My system:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
I am trying to get my new ThinkPad with "NVIDIA RTX 4000 Ada 12 GB" graphics card going.
No matter what "cuda-driver(12.4)+cudnn+jax+jaxlib" combination I try, the best results are either a)"No GPU/TPU found, falling back to CPU." or b)"failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error"
When I run Data Sampler section from https://github.com/PredictiveIntelligenceLab/ImprovedDeepONets/blob/main/Stokes/PI_DeepONet_Stokes.ipynb
I get errors like:
a)
Installation: pip install jaxlib==0.4.7+cuda12.cudnn88 jax==0.4.7 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Run: runfile('/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes/PI_DeepONet_Stokes-Copy1', wdir='/home/saumya/NeuralN/Op Net/ImprovedDeepONets/Stokes') 2024-03-19 11:48:27.682846: I external/xla/xla/service/service.cc:168] XLA service 0x8dd95c0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices: 2024-03-19 11:48:27.682867: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 11:48:27.689135: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:218] TfrtCpuClient created.
2024-03-19 11:48:29.450971: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 11:48:29.450988: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 11:48:29.450991: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 11:48:29.451052: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 11:48:29.451064: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
b) Installation: pip install jaxlib==0.4.9+cuda12.cudnn88 jax==0.4.9 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Run: 2024-03-19 12:10:31.130411: I external/xla/xla/service/service.cc:168] XLA service 0x6a1d490 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices: 2024-03-19 12:10:31.130427: I external/xla/xla/service/service.cc:176] StreamExecutor device (0): Interpreter,
2024-03-19 12:10:31.134477: I external/xla/xla/pjrt/tfrt_cpu_pjrt_client.cc:433] TfrtCpuClient created.
2024-03-19 12:10:50.428065: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2024-03-19 12:10:50.428083: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: saumya-TP-GPU
2024-03-19 12:10:50.428086: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: saumya-TP-GPU
2024-03-19 12:10:50.428143: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 550.54.14
2024-03-19 12:10:50.428156: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: NOT_FOUND: could not find kernel module information in driver version file contents: "NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 550.54.14 Release Build (dvs-builder@U16-A24-2-2) Thu Feb 22 01:44:50 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
"
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
My system: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0
$ nvidia-smi Tue Mar 19 12:21:40 2024
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 ERR! Off | 00000000:01:00.0 N/A | N/A | |ERR! ERR! ERR! N/A / N/A | 14MiB / 12282MiB | N/A Default | | | | ERR! | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
python version $ whereis python | tr ' ' '\n' | grep ^/ | sort /home/saumya/anaconda3/envs/OpNet/bin/python $ python --version && python3 --version Python 3.9.18 Python 3.9.18