JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.16k stars 206 forks source link

[cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR #2407

Open kmp5VT opened 3 weeks ago

kmp5VT commented 3 weeks ago

Describe the bug

Occasionally, there is an issue when contracting sub-matrices from views of CuArrays using the cuTENSOR backend. The contractions work fine with cuBLAS but fail with

ERROR: CUTENSORError: an invalid value was used as an argument (code 7, CUTENSOR_STATUS_INVALID_VALUE)
Stacktrace:
 [1] throw_api_error(res::cuTENSOR.cutensorStatus_t)
   @ cuTENSOR ~/.julia/packages/cuTENSOR/uwns2/src/libcutensor.jl:14
 [2] check
   @ ~/.julia/packages/cuTENSOR/uwns2/src/libcutensor.jl:27 [inlined]
 [3] cutensorContract
   @ ~/.julia/packages/CUDA/75aiI/lib/utils/call.jl:34 [inlined]
 [4] 
   @ cuTENSOR ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:294
 [5] #contract!#83
   @ ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:278 [inlined]
 [6] contract!
   @ ~/.julia/packages/cuTENSOR/uwns2/src/operations.jl:259 [inlined]
 [7] mul!
   @ ~/.julia/packages/cuTENSOR/uwns2/src/interfaces.jl:57 [inlined]
 [8] mul!(C::CuTensor{Float32, 2}, A::CuTensor{Float32, 2}, B::CuTensor{Float32, 2})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237
 [9] top-level scope
   @ ~/.julia/dev/testing.jl:321
Some type information was truncated. Use `show(err)` to see complete types.

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA, cuTENSOR, LinearAlgebra
A = cu(randn(5))
B = cu(randn(1))
C = cu(randn(5))
vA = @view A[2:5]
vB = @view B[1:1]
vC = @view C[2:5]

tA = CuTensor(reshape(vA, (4,1)), [1,2])
tB = CuTensor(reshape(vB, (1,1)), [2,3])
tC = CuTensor(reshape(vC, (4,1)), [1,3])
mul!(reshape(vC, (4,1)), reshape(vA, (4,1)), reshape(vB, (1,1))) ## works fine
mul!(tC, tA, tB) ## Fails
Manifest.toml

``` (jl_HrbN51) pkg> st Status `/tmp/jl_HrbN51/Project.toml` [052768ef] CUDA v5.4.2 ```

Version info

Details on Julia:

# please post the output of:
versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  LD_LIBRARY_PATH = /mnt/sw/nix/store/pmwk60bp5k4qr8vsg411p7vzhr502d83-openblas-0.3.23/lib:/cm/shared/apps/slurm/current/lib64

Details on CUDA:

# please post the output of:
CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 550.76.0, originally for CUDA 12.4

CUDA libraries: 
- CUBLAS: 12.5.2
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.2
- CUSPARSE: 12.4.1
- CUPTI: 23.0.0
- NVML: 12.0.0+550.76

Julia packages: 
- CUDA: 5.4.2
- CUDA_Driver_jll: 0.9.0+0
- CUDA_Runtime_jll: 0.14.0+1

Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A6000 (sm_86, 45.830 GiB / 47.988 GiB available)
kmp5VT commented 3 weeks ago

As a followup the code does run successfully with an element type of ComplexF64

using CUDA, cuTENSOR, LinearAlgebra
elt = ComplexF64
A = CuArray(randn(elt, 5))
B = CuArray(randn(elt, 1))
C = CuArray(randn(elt, 5))
vA = @view A[2:5]
vB = @view B[1:1]
vC = @view C[2:5]

tA = CuTensor(reshape(vA, (4,1)), [1,2])
tB = CuTensor(reshape(vB, (1,1)), [2,3])
tC = CuTensor(reshape(vC, (4,1)), [1,3])
mul!(reshape(vC, (4,1)), reshape(vA, (4,1)), reshape(vB, (1,1)))
mul!(tC, tA, tB) 
vC ≈ tC.data # true