CUDNN_STATUS_INTERNAL_ERROR with cuda 12 + cudnn 8.9.7

I've been running into this issue with XLA. I'm running on ubuntu 24.04 all other ML frameworks work, pytorch, tinygrad all work fine with my current setup. The only one I can't seem to get to work is exla.

I've tried just setting the XLA_TARGET to cuda120 using precompiled binary and compiled from source using cuda + XLA_BUILD=true.

Here is the log output:

04:32:33.850 [info] XLA service 0x7f73bc03b640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

04:32:33.850 [info]   StreamExecutor device (0): Quadro M2000, Compute Capability 5.2

04:32:33.850 [info] Using BFC allocator.

04:32:33.850 [info] XLA backend allocating 3799410278 bytes on device 0 for BFCAllocator.

04:32:33.945 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

04:32:33.945 [error] Memory usage: 382402560 bytes free, 4221566976 bytes total.

** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
    (exla 0.7.2) lib/exla/mlir/module.ex:127: EXLA.MLIR.Module.unwrap!/1
    (exla 0.7.2) lib/exla/mlir/module.ex:113: EXLA.MLIR.Module.compile/5
    (stdlib 5.2.3) timer.erl:270: :timer.tc/2
    (exla 0.7.2) lib/exla/defn.ex:599: anonymous fn/12 in EXLA.Defn.compile/8
    (exla 0.7.2) lib/exla/mlir/context_pool.ex:10: anonymous fn/3 in EXLA.MLIR.ContextPool.checkout/1
    (nimble_pool 1.1.0) lib/nimble_pool.ex:462: NimblePool.checkout!/4
    (exla 0.7.2) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
    #cell:5v4cfutqdmuxbmtz:4: (file)

Code example I ran:

t1 = Nx.tensor([1.0, 2.0])
t2 = Nx.tensor([2.0, 3.0])

Nx.dot(t1, t2)

Here is the nvcc --version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

nvidia-smi

nvidia-smi
Thu Jun  6 04:47:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro M2000                   Off |   00000000:03:00.0 Off |                  N/A |
| 56%   33C    P8             12W /   75W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

elixir-nx / xla

CUDNN_STATUS_INTERNAL_ERROR with cuda 12 + cudnn 8.9.7 #86