--- START ---
2024-11-13 09:27:00.593435: I xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
AAA
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1731518820.724893 22888 cuda_executor.cc:1040] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1731518820.726270 22857 service.cc:146] XLA service 0x7236dc427830 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1731518820.726300 22857 service.cc:154] StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
I0000 00:00:1731518820.726656 22857 se_gpu_pjrt_client.cc:889] Using BFC allocator.
I0000 00:00:1731518820.726711 22857 gpu_helpers.cc:114] XLA backend allocating 11264222822 bytes on device 0 for BFCAllocator.
I0000 00:00:1731518820.726730 22857 gpu_helpers.cc:154] XLA backend will use up to 1251580313 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1731518820.726834 22857 cuda_executor.cc:1040] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
BBB
CCC
09:27:00.854 [error] There was an error before creating cudnn handle (302): Error loading CUDA libraries. GPU will not be used. : Error loading CUDA libraries. GPU will not be used.
09:27:00.856 [error] There was an error before creating cudnn handle (302): Error loading CUDA libraries. GPU will not be used. : Error loading CUDA libraries. GPU will not be used.
** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
(exla 0.9.1) lib/exla/mlir/module.ex:147: EXLA.MLIR.Module.unwrap!/1
(exla 0.9.1) lib/exla/mlir/module.ex:124: EXLA.MLIR.Module.compile/5
(stdlib 6.1.1) timer.erl:590: :timer.tc/2
(exla 0.9.1) lib/exla/defn.ex:432: anonymous fn/14 in EXLA.Defn.compile/8
(exla 0.9.1) lib/exla/mlir/context_pool.ex:10: anonymous fn/3 in EXLA.MLIR.ContextPool.checkout/1
(nimble_pool 1.1.0) lib/nimble_pool.ex:462: NimblePool.checkout!/4
(exla 0.9.1) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
(stdlib 6.1.1) timer.erl:590: :timer.tc/2
Diagnostic output:
> nvidia-smi
Wed Nov 13 09:29:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:04:00.0 Off | N/A |
| 0% 21C P8 6W / 170W | 4MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
I'm using elixir 1.17.3-otp-27. The OS kernel is 6.8.0-48-generic. I also get this same error when running in an elixir project with a mix.exs file. There are shared library files in /usr/lib/x86_64-linux-gnu/libcuda.so and ~/.cache/mix/.../libexla.so. Hmm...
I'm out of ideas on how to fix! Any help appreciated!!!
For the past year I've been running NX successfully on Ubuntu 22.04 with an RTX 3060 GPU.
A couple weeks ago I had to re-install the OS. The new OS is also Ubuntu 22.04.
Since the re-install, all my GPU apps (ollama, fabric, aider, nvidia-smi, nvtop) work, but NX is broken.
Here's my NX test script:
Here's the script output:
Diagnostic output:
Here's how I install the Nvidia dependencies:
I'm using
elixir 1.17.3-otp-27
. The OS kernel is6.8.0-48-generic
. I also get this same error when running in an elixir project with amix.exs
file. There are shared library files in/usr/lib/x86_64-linux-gnu/libcuda.so
and~/.cache/mix/.../libexla.so
. Hmm...I'm out of ideas on how to fix! Any help appreciated!!!