elixir-nx / xla

Pre-compiled XLA extension
Apache License 2.0
83 stars 21 forks source link

CUDNN_STATUS_INTERNAL_ERROR with cuda 12 + cudnn 8.9.7 #86

Closed zacksiri closed 3 weeks ago

zacksiri commented 3 weeks ago

I've been running into this issue with XLA. I'm running on ubuntu 24.04 all other ML frameworks work, pytorch, tinygrad all work fine with my current setup. The only one I can't seem to get to work is exla.

I've tried just setting the XLA_TARGET to cuda120 using precompiled binary and compiled from source using cuda + XLA_BUILD=true.

Here is the log output:

04:32:33.850 [info] XLA service 0x7f73bc03b640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

04:32:33.850 [info]   StreamExecutor device (0): Quadro M2000, Compute Capability 5.2

04:32:33.850 [info] Using BFC allocator.

04:32:33.850 [info] XLA backend allocating 3799410278 bytes on device 0 for BFCAllocator.

04:32:33.945 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

04:32:33.945 [error] Memory usage: 382402560 bytes free, 4221566976 bytes total.
** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
    (exla 0.7.2) lib/exla/mlir/module.ex:127: EXLA.MLIR.Module.unwrap!/1
    (exla 0.7.2) lib/exla/mlir/module.ex:113: EXLA.MLIR.Module.compile/5
    (stdlib 5.2.3) timer.erl:270: :timer.tc/2
    (exla 0.7.2) lib/exla/defn.ex:599: anonymous fn/12 in EXLA.Defn.compile/8
    (exla 0.7.2) lib/exla/mlir/context_pool.ex:10: anonymous fn/3 in EXLA.MLIR.ContextPool.checkout/1
    (nimble_pool 1.1.0) lib/nimble_pool.ex:462: NimblePool.checkout!/4
    (exla 0.7.2) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
    #cell:5v4cfutqdmuxbmtz:4: (file)

Code example I ran:

t1 = Nx.tensor([1.0, 2.0])
t2 = Nx.tensor([2.0, 3.0])

Nx.dot(t1, t2)

Here is the nvcc --version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

nvidia-smi

nvidia-smi
Thu Jun  6 04:47:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro M2000                   Off |   00000000:03:00.0 Off |                  N/A |
| 56%   33C    P8             12W /   75W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
zacksiri commented 3 weeks ago

Apologies, I retried using cuda120 and everything worked out of the box. Closing this.

The main issue was I was using cuDNN 9.1 before, and cuda120 wasn't working. I then downgraded to 8.9.7 and then tried compiling from source without tryng cuda120 first.

After compiling from source didn't work, I decided to try cuda120 again.