Jutho / TensorOperations.jl

Julia package for tensor contractions and related operations
https://jutho.github.io/TensorOperations.jl/stable/
Other
438 stars 55 forks source link

Bug in CUDA backend #151

Closed tjjarvinen closed 9 months ago

tjjarvinen commented 9 months ago
using TensorOperations
using cuTENSOR
using CUDA

n=6
m=3

p = CuArray( rand(n,n,n,m,m,m) )
v = similar(p)
t = CuArray( rand(n,n,m,m) ) 

@tensor v[-1,-2,-3,-4,-5,-6] = t[-1,1,-4,2] * t[-2,3,-5,4] * t[-3,5,-6,6] * p[1,3,5,2,4,6];

Produces (v4.0.6, CUDA v5.0.0, cuTENSOR v.1.2.0)

ERROR: ArgumentError: cannot convert to either a CPU or GPU pointer
Stacktrace:
  [1] unsafe_convert(#unused#::Type{PtrOrCuPtr{Nothing}}, val::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/pointer.jl:126
  [2] macro expansion
    @ ~/.julia/packages/cuTENSOR/saDTo/src/libcutensor.jl:390 [inlined]
  [3] (::cuTENSOR.var"#43#44"{Ptr{cuTENSOR.cutensorHandle_t}, StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, cuTENSOR.CuTensorDescriptor, Base.RefValue{UInt32}})()
    @ cuTENSOR ~/.julia/packages/CUDA/nbRJk/lib/utils/call.jl:27
  [4] (::cuTENSOR.var"#1#2"{cuTENSOR.var"#43#44"{Ptr{cuTENSOR.cutensorHandle_t}, StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, cuTENSOR.CuTensorDescriptor, Base.RefValue{UInt32}}})()
    @ cuTENSOR ~/.julia/packages/cuTENSOR/saDTo/src/libcutensor.jl:17
  [5] retry_reclaim(f::cuTENSOR.var"#1#2"{cuTENSOR.var"#43#44"{Ptr{cuTENSOR.cutensorHandle_t}, StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, cuTENSOR.CuTensorDescriptor, Base.RefValue{UInt32}}}, isfailed::Base.Fix2{typeof(in), Tuple{cuTENSOR.cutensorStatus_t}})
    @ CUDA ~/.julia/packages/CUDA/nbRJk/src/pool.jl:359
  [6] check(::cuTENSOR.var"#43#44"{Ptr{cuTENSOR.cutensorHandle_t}, StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, cuTENSOR.CuTensorDescriptor, Base.RefValue{UInt32}})
    @ cuTENSOR ~/.julia/packages/cuTENSOR/saDTo/src/libcutensor.jl:16
  [7] cutensorGetAlignmentRequirement(handle::Ptr{cuTENSOR.cutensorHandle_t}, ptr::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, desc::cuTENSOR.CuTensorDescriptor, alignmentRequirement::Base.RefValue{UInt32})
    @ cuTENSOR ~/.julia/packages/CUDA/nbRJk/lib/utils/call.jl:26
  [8] _contraction_descriptor(C::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pC::Tuple{NTuple{4, Int64}, Tuple{Int64, Int64}}, A::StridedViews.StridedView{Float64, 4, CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, B::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pB::Tuple{Tuple{Int64, Int64}, NTuple{4, Int64}})
    @ TensorOperationscuTENSORExt ~/.julia/packages/TensorOperations/17v4v/ext/TensorOperationscuTENSORExt.jl:167
  [9] tensorcontract!(C::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pC::Tuple{NTuple{4, Int64}, Tuple{Int64, Int64}}, A::StridedViews.StridedView{Float64, 4, CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, conjA::Symbol, B::StridedViews.StridedView{Float64, 6, CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, typeof(identity)}, pB::Tuple{Tuple{Int64, Int64}, NTuple{4, Int64}}, conjB::Symbol, α::VectorInterface.One, β::VectorInterface.Zero, ::TensorOperations.Backend{:StridedCUDA})
    @ TensorOperationscuTENSORExt ~/.julia/packages/TensorOperations/17v4v/ext/TensorOperationscuTENSORExt.jl:130
 [10] tensorcontract!(C::CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, pC::Tuple{NTuple{4, Int64}, Tuple{Int64, Int64}}, A::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, conjA::Symbol, B::CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, pB::Tuple{Tuple{Int64, Int64}, NTuple{4, Int64}}, conjB::Symbol, α::VectorInterface.One, β::VectorInterface.Zero, backend::TensorOperations.Backend{:StridedCUDA})
    @ TensorOperationscuTENSORExt ~/.julia/packages/TensorOperations/17v4v/ext/TensorOperationscuTENSORExt.jl:114
 [11] tensorcontract!(C::CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, pC::Tuple{NTuple{4, Int64}, Tuple{Int64, Int64}}, A::CuArray{Float64, 4, CUDA.Mem.DeviceBuffer}, pA::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}, conjA::Symbol, B::CuArray{Float64, 6, CUDA.Mem.DeviceBuffer}, pB::Tuple{Tuple{Int64, Int64}, NTuple{4, Int64}}, conjB::Symbol, α::VectorInterface.One, β::VectorInterface.Zero)
    @ TensorOperationscuTENSORExt ~/.julia/packages/TensorOperations/17v4v/ext/TensorOperationscuTENSORExt.jl:108
 [12] top-level scope
    @ REPL[18]:1
 [13] top-level scope
    @ ~/.julia/packages/CUDA/nbRJk/src/initialization.jl:205

On v3.2.5 it works just fine.

Jutho commented 9 months ago

The reason for this happening now is that indeed because you have CUDA.jl v5.0.0, whereas our tests and compat bounds are "CUDA=4". Apparently, these compat bounds are not respected in combination with package extensiosn (https://github.com/cjdoris/PackageExtensionCompat.jl/issues/4)

We are looking into fixing this asap.

lkdvos commented 9 months ago

Actually, it seems like this might be a bug from our side. I think because of how I implemented the package extension, it was not reading the CUDA compat. I changed this now, such that TensorOperations should now restrict CUDA to v4. In order to support v5, a small adaptation of StridedViews.jl should do the trick, so we should be able to fix this soon.

Jutho commented 9 months ago

@lkdvos , so with the following configuration the above code definitely works:

⌅ [052768ef] CUDA v4.4.1
  [6aa20fa7] TensorOperations v4.0.6 `~/.julia/dev/TensorOperations`
⌃ [011b41b2] cuTENSOR v1.1.0

The following line, https://github.com/JuliaGPU/CUDA.jl/blob/9888ac92810373c3d6b58b5ca972d8df2afb4829/lib/cutensor/src/libcutensor.jl#L390C16-L390C16, which is in the stack trace, was already there in CUDA 4.4.1 and was already then forcing the conversion to ptr::PtrOrCuPtr{Cvoid}. So I guess that something in CUDA itself changed of how it handles conversions to PtrOrCuPtr{Cvoid} and why this does no longer end up with our definition for converting a CuStridedView{T} to PtrOrCuPtr{T}.

Jutho commented 9 months ago

Ok I think I understand. the reason why StridedViews was not enforcing CUDAv4 is because Strided 2.0.4 allows both StridedViews 0.1 and 0.2. So the package manager resolves this by having TensorOperations 4.0.6, Strided 2.0.4, CUDA5, and StridedViews 0.1.2. The latter doesn't have the package extension for CUDA and so does not impose restrictions on CUDA. And therefore, our pointer conversion routine is not called at all.

tjjarvinen commented 9 months ago

Can confirm that it now works with v4.07 - thanks!

Closing this one.

Jutho commented 9 months ago

Thanks for reporting