Open TheFibonacciEffect opened 1 month ago
Hi @TheFibonacciEffect ! Thanks for the feedback - can you provide the full stacktrace that you get?
Sure: Here is the program again:
using Dagger
# All GPU users - run this!
using DaggerGPU
# Annoying, but we need to restart the scheduler for the below changes to take effect...
# Will be fixed in future versions of Dagger!
Dagger.cancel!(;halt_sch=true)
# And we'll setup some defaults, just in case you don't have a GPU, but want to run the examples
GPUArray = Array
scope = Dagger.scope(;worker=1, threads=:)
# NVIDIA GPU users - run this!
using CUDA
# Make sure that we have at least one GPU
@assert length(CUDA.devices()) > 0 "You don't have any NVIDIA GPUs!"
# Pick the first available GPU
GPUArray = CuArray
scope = Dagger.scope(;cuda_gpu=1)
And here is the full stacktrace:
ERROR: DTaskFailedException:
Root Exception Type: ErrorException
Root Exception:
Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:48 [inlined]
[6] scalar_getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:34 [inlined]
[7] _getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:17 [inlined]
[8] getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:15 [inlined]
[9] getindex
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/adjtrans.jl:334 [inlined]
[10] getindex
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:265 [inlined]
[11] _getindex
@ ./abstractarray.jl:1361 [inlined]
[12] getindex
@ ./abstractarray.jl:1315 [inlined]
[13] iterate
@ ./abstractarray.jl:1212 [inlined]
[14] iterate
@ ./abstractarray.jl:1210 [inlined]
[15] copyto_unaliased!(deststyle::IndexLinear, dest::CuArray{…}, srcstyle::IndexCartesian, src::LinearAlgebra.LowerTriangular{…})
@ Base ./abstractarray.jl:1086
[16] copyto!
@ ./abstractarray.jl:1061 [inlined]
[17] +(A::LinearAlgebra.LowerTriangular{Float32, LinearAlgebra.Adjoint{…}}, B::LinearAlgebra.UpperTriangular{Float32, CuArray{…}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:747
[18] copydiagtile!(A::CuArray{Float32, 2, CUDA.DeviceMemory}, uplo::Char)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:403
[19] #invokelatest#2
@ ./essentials.jl:1043 [inlined]
[20] invokelatest
@ ./essentials.jl:1040 [inlined]
[21] (::CUDAExt.var"#26#27"{@Kwargs{}, CUDAExt.CuArrayDeviceProc, typeof(Dagger.copydiagtile!), Tuple{…}, @NamedTuple{…}})()
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:275
Stacktrace:
[1] wait(t::Task)
@ Base ./task.jl:370
[2] fetch
@ ./task.jl:390 [inlined]
[3] execute!(::CUDAExt.CuArrayDeviceProc, ::Any, ::Any, ::Vararg{Any}; kwargs...)
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:281
[4] execute!(::CUDAExt.CuArrayDeviceProc, ::Any, ::Any, ::Vararg{Any})
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:269
[5] #169
@ ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1659 [inlined]
[6] #21
@ ~/.julia/packages/Dagger/aVKft/src/options.jl:18 [inlined]
[7] with(::Dagger.var"#21#22"{Dagger.Sch.var"#169#177"{…}}, ::Pair{Base.ScopedValues.ScopedValue{…}, @NamedTuple{…}})
@ Base.ScopedValues ./scopedvalues.jl:267
[8] with_options(f::Dagger.Sch.var"#169#177"{CUDAExt.CuArrayDeviceProc, Vector{Pair{Symbol, Any}}, Vector{Any}}, options::@NamedTuple{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:17
[9] do_task(to_proc::CUDAExt.CuArrayDeviceProc, task_desc::Vector{Any})
@ Dagger.Sch ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1657
[10] (::Dagger.Sch.var"#145#153"{UInt64, UInt32, Dagger.Sch.ProcessorInternalState, Distributed.RemoteChannel{Channel{Any}}, CUDAExt.CuArrayDeviceProc})()
@ Dagger.Sch ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1333
This Task: DTask(id=8, Dagger.Chunk{typeof(Dagger.copydiagtile!), MemPool.DRef, OSProc, UnionScope}(typeof(Dagger.copydiagtile!), UnitDomain(), MemPool.DRef(1, 33, 0x0000000000000000), OSProc(1), UnionScope:
ExactScope: processor == CuArrayDeviceProc(worker 1, device 0, uuid 77b44642-e0a6-ba49-8489-f70e83dde7f7), false)(Dagger.WeakChunk(1, 17, WeakRef(Dagger.Chunk{CuArray{Float32, 2, CUDA.DeviceMemory}, MemPool.DRef, CUDAExt.CuArrayDeviceProc, AnyScope}(CuArray{Float32, 2, CUDA.DeviceMemory}, ArrayDomain{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}((1:64, 1:64)), MemPool.DRef(1, 17, 0x0000000000004000), CuArrayDeviceProc(worker 1, device 0, uuid 77b44642-e0a6-ba49-8489-f70e83dde7f7), AnyScope(), false))), U))
Stacktrace:
[1] fetch(t::Dagger.ThunkFuture; proc::OSProc, raw::Bool)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/dtask.jl:17
[2] fetch
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:12 [inlined]
[3] #fetch#76
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:72 [inlined]
[4] fetch
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:68 [inlined]
[5] wait_all(f::Function; check_errors::Bool)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/queue.jl:100
[6] wait_all
@ ~/.julia/packages/Dagger/aVKft/src/queue.jl:95 [inlined]
[7] #spawn_datadeps#254
@ ~/.julia/packages/Dagger/aVKft/src/datadeps.jl:942 [inlined]
[8] spawn_datadeps
@ ~/.julia/packages/Dagger/aVKft/src/datadeps.jl:934 [inlined]
[9] copytri!
@ ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:363 [inlined]
[10] syrk_dagger!(C::DMatrix{…}, trans::Char, A::DMatrix{…}, _add::LinearAlgebra.MulAddMul{…})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:351
[11] (::Dagger.var"#661#665"{Char, LinearAlgebra.MulAddMul{…}})(C::DMatrix{Float32, Blocks{…}, typeof(cat)}, A::DMatrix{Float32, Blocks{…}, typeof(cat)})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:18
[12] maybe_copy_buffered(::Function, ::Pair{DMatrix{Float32, Blocks{…}, typeof(cat)}, Blocks{2}}, ::Vararg{Pair{DMatrix{…}, Blocks{…}}})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/copy.jl:8
[13] generic_matmatmul!(C::DMatrix{…}, transA::Char, transB::Char, A::DMatrix{…}, B::DMatrix{…}, _add::LinearAlgebra.MulAddMul{…})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:17
[14] _mul!
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:287 [inlined]
[15] mul!
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:285 [inlined]
[16] mul!(C::DMatrix{Float32, Blocks{…}, typeof(cat)}, A::DMatrix{Float32, Blocks{…}, typeof(cat)}, B::LinearAlgebra.Adjoint{Float32, DMatrix{…}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:253
[17] *(A::DMatrix{Float32, Blocks{2}, typeof(cat)}, B::LinearAlgebra.Adjoint{Float32, DMatrix{Float32, Blocks{2}, typeof(cat)}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:114
[18] (::var"#3#4")()
@ Main ./REPL[16]:9
[19] #21
@ ~/.julia/packages/Dagger/aVKft/src/options.jl:18 [inlined]
[20] with(::Dagger.var"#21#22"{var"#3#4"}, ::Pair{Base.ScopedValues.ScopedValue{NamedTuple}, @NamedTuple{scope::UnionScope}})
@ Base.ScopedValues ./scopedvalues.jl:267
[21] with_options(f::var"#3#4", options::@NamedTuple{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:17
[22] with_options(f::Function; options::@Kwargs{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:21
[23] top-level scope
@ REPL[16]:1
Some type information was truncated. Use `show(err)` to see complete types.
julia> show(err)
1-element ExceptionStack:
DTaskFailedException:
Root Exception Type: ErrorException
Root Exception:
Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:48 [inlined]
[6] scalar_getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:34 [inlined]
[7] _getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:17 [inlined]
[8] getindex
@ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:15 [inlined]
[9] getindex
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/adjtrans.jl:334 [inlined]
[10] getindex
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:265 [inlined]
[11] _getindex
@ ./abstractarray.jl:1361 [inlined]
[12] getindex
@ ./abstractarray.jl:1315 [inlined]
[13] iterate
@ ./abstractarray.jl:1212 [inlined]
[14] iterate
@ ./abstractarray.jl:1210 [inlined]
[15] copyto_unaliased!(deststyle::IndexLinear, dest::CuArray{Float32, 2, CUDA.DeviceMemory}, srcstyle::IndexCartesian, src::LinearAlgebra.LowerTriangular{Float32, LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}}})
@ Base ./abstractarray.jl:1086
[16] copyto!
@ ./abstractarray.jl:1061 [inlined]
[17] +(A::LinearAlgebra.LowerTriangular{Float32, LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}}}, B::LinearAlgebra.UpperTriangular{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/triangular.jl:747
[18] copydiagtile!(A::CuArray{Float32, 2, CUDA.DeviceMemory}, uplo::Char)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:403
[19] #invokelatest#2
@ ./essentials.jl:1043 [inlined]
[20] invokelatest
@ ./essentials.jl:1040 [inlined]
[21] (::CUDAExt.var"#26#27"{@Kwargs{}, CUDAExt.CuArrayDeviceProc, typeof(Dagger.copydiagtile!), Tuple{CuArray{Float32, 2, CUDA.DeviceMemory}, Char}, @NamedTuple{sch_uid::UInt64, sch_handle::Dagger.Sch.SchedulerHandle, processor::CUDAExt.CuArrayDeviceProc, task_spec::Vector{Any}}})()
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:275
Stacktrace:
[1] wait(t::Task)
@ Base ./task.jl:370
[2] fetch
@ ./task.jl:390 [inlined]
[3] execute!(::CUDAExt.CuArrayDeviceProc, ::Any, ::Any, ::Vararg{Any}; kwargs...)
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:281
[4] execute!(::CUDAExt.CuArrayDeviceProc, ::Any, ::Any, ::Vararg{Any})
@ CUDAExt ~/.julia/packages/DaggerGPU/Kt3Ax/ext/CUDAExt.jl:269
[5] #169
@ ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1659 [inlined]
[6] #21
@ ~/.julia/packages/Dagger/aVKft/src/options.jl:18 [inlined]
[7] with(::Dagger.var"#21#22"{Dagger.Sch.var"#169#177"{CUDAExt.CuArrayDeviceProc, Vector{Pair{Symbol, Any}}, Vector{Any}}}, ::Pair{Base.ScopedValues.ScopedValue{NamedTuple}, @NamedTuple{scope::UnionScope}})
@ Base.ScopedValues ./scopedvalues.jl:267
[8] with_options(f::Dagger.Sch.var"#169#177"{CUDAExt.CuArrayDeviceProc, Vector{Pair{Symbol, Any}}, Vector{Any}}, options::@NamedTuple{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:17
[9] do_task(to_proc::CUDAExt.CuArrayDeviceProc, task_desc::Vector{Any})
@ Dagger.Sch ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1657
[10] (::Dagger.Sch.var"#145#153"{UInt64, UInt32, Dagger.Sch.ProcessorInternalState, Distributed.RemoteChannel{Channel{Any}}, CUDAExt.CuArrayDeviceProc})()
@ Dagger.Sch ~/.julia/packages/Dagger/aVKft/src/sch/Sch.jl:1333
This Task: DTask(id=8, Dagger.Chunk{typeof(Dagger.copydiagtile!), MemPool.DRef, OSProc, UnionScope}(typeof(Dagger.copydiagtile!), UnitDomain(), MemPool.DRef(1, 33, 0x0000000000000000), OSProc(1), UnionScope:
ExactScope: processor == CuArrayDeviceProc(worker 1, device 0, uuid 77b44642-e0a6-ba49-8489-f70e83dde7f7), false)(Dagger.WeakChunk(1, 17, WeakRef(Dagger.Chunk{CuArray{Float32, 2, CUDA.DeviceMemory}, MemPool.DRef, CUDAExt.CuArrayDeviceProc, AnyScope}(CuArray{Float32, 2, CUDA.DeviceMemory}, ArrayDomain{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}((1:64, 1:64)), MemPool.DRef(1, 17, 0x0000000000004000), CuArrayDeviceProc(worker 1, device 0, uuid 77b44642-e0a6-ba49-8489-f70e83dde7f7), AnyScope(), false))), U))
Stacktrace:
[1] fetch(t::Dagger.ThunkFuture; proc::OSProc, raw::Bool)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/dtask.jl:17
[2] fetch
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:12 [inlined]
[3] #fetch#76
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:72 [inlined]
[4] fetch
@ ~/.julia/packages/Dagger/aVKft/src/dtask.jl:68 [inlined]
[5] wait_all(f::Function; check_errors::Bool)
@ Dagger ~/.julia/packages/Dagger/aVKft/src/queue.jl:100
[6] wait_all
@ ~/.julia/packages/Dagger/aVKft/src/queue.jl:95 [inlined]
[7] #spawn_datadeps#254
@ ~/.julia/packages/Dagger/aVKft/src/datadeps.jl:942 [inlined]
[8] spawn_datadeps
@ ~/.julia/packages/Dagger/aVKft/src/datadeps.jl:934 [inlined]
[9] copytri!
@ ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:363 [inlined]
[10] syrk_dagger!(C::DMatrix{Float32, Blocks{2}, typeof(cat)}, trans::Char, A::DMatrix{Float32, Blocks{2}, typeof(cat)}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:351
[11] (::Dagger.var"#661#665"{Char, LinearAlgebra.MulAddMul{true, true, Bool, Bool}})(C::DMatrix{Float32, Blocks{2}, typeof(cat)}, A::DMatrix{Float32, Blocks{2}, typeof(cat)})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:18
[12] maybe_copy_buffered(::Function, ::Pair{DMatrix{Float32, Blocks{2}, typeof(cat)}, Blocks{2}}, ::Vararg{Pair{DMatrix{Float32, Blocks{2}, typeof(cat)}, Blocks{2}}})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/copy.jl:8
[13] generic_matmatmul!(C::DMatrix{Float32, Blocks{2}, typeof(cat)}, transA::Char, transB::Char, A::DMatrix{Float32, Blocks{2}, typeof(cat)}, B::DMatrix{Float32, Blocks{2}, typeof(cat)}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/array/mul.jl:17
[14] _mul!
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:287 [inlined]
[15] mul!
@ ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:285 [inlined]
[16] mul!(C::DMatrix{Float32, Blocks{2}, typeof(cat)}, A::DMatrix{Float32, Blocks{2}, typeof(cat)}, B::LinearAlgebra.Adjoint{Float32, DMatrix{Float32, Blocks{2}, typeof(cat)}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:253
[17] *(A::DMatrix{Float32, Blocks{2}, typeof(cat)}, B::LinearAlgebra.Adjoint{Float32, DMatrix{Float32, Blocks{2}, typeof(cat)}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:114
[18] (::var"#3#4")()
@ Main ./REPL[16]:9
[19] #21
@ ~/.julia/packages/Dagger/aVKft/src/options.jl:18 [inlined]
[20] with(::Dagger.var"#21#22"{var"#3#4"}, ::Pair{Base.ScopedValues.ScopedValue{NamedTuple}, @NamedTuple{scope::UnionScope}})
@ Base.ScopedValues ./scopedvalues.jl:267
[21] with_options(f::var"#3#4", options::@NamedTuple{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:17
[22] with_options(f::Function; options::@Kwargs{scope::UnionScope})
@ Dagger ~/.julia/packages/Dagger/aVKft/src/options.jl:21
[23] top-level scope
@ REPL[16]:1
This is the current Manifest.toml file (github only allows .txt files, so I added that suffix)
And here is some additional information about the GPU I am using:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro RTX 4000"
CUDA Driver Version / Runtime Version 11.4 / 11.2
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7960 MBytes (8346533888 bytes)
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
(036) Multiprocessors, (064) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1545 MHz (1.54 GHz)
Memory Clock rate: 6501 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 65536 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS
Thanks a lot again, if there is more information needed, just ask : )
Thanks for the info! I'm still traveling home to the US, but I'll plan to take a look at this again this week.
Ok, this happens because we internally do some UpperTriangular(A)' + UpperTriangular(A)
, where we should really use .+
instead to ensure GPU support. I'm putting together a branch with this and a few other fixes, and will validate that it works locally with AMDGPU.jl (as that's what I've got on my laptop), then I'll post it so you can validate that it works on your system too.
Perfect, thanks a lot : )
I was following along the https://github.com/jpsamaroo/DaggerWorkshop2024 and noticed that matrix transposition does not seem to work on NVIDIA GPUs for me.
Sorry if this is a bit brief, ask questions if there is something missing.
The error is
PS: Thanks for the talk and enjoy the conference @jpsamaroo