`minimum_zspacing` isn't working on the GPU

tomchor commented 1 year ago

I haven't had the the time to investigate this for now, but min_Δz() (and supposedly the x and y version as well) complains about scalar indexing on the GPU when using irregular z spacing:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore /glade/work/tomasc/.julia/packages/GPUArraysCore/HaQcr/src/GPUArraysCore.jl:103
  [3] getindex
    @ /glade/work/tomasc/.julia/packages/GPUArrays/7TiO1/src/host/indexing.jl:9 [inlined]
  [4] getindex
    @ /glade/work/tomasc/.julia/packages/OffsetArrays/TcCEq/src/OffsetArrays.jl:436 [inlined]
  [5] getindex
    @ ./subarray.jl:315 [inlined]
  [6] mapreduce_impl(f::typeof(identity), op::typeof(min), A::SubArray{Float64, 1, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, Tuple{UnitRange{Int64}}, true}, first::Int64, last::Int64)
    @ Base ./reduce.jl:638
  [7] _mapreduce(f::typeof(identity), op::typeof(min), #unused#::IndexLinear, A::SubArray{Float64, 1, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, Tuple{UnitRange{Int64}}, true})
    @ Base ./reduce.jl:442
  [8] _mapreduce_dim
    @ ./reducedim.jl:365 [inlined]
  [9] #mapreduce#765
    @ ./reducedim.jl:357 [inlined]
 [10] mapreduce
    @ ./reducedim.jl:357 [inlined]
 [11] #_minimum#787
    @ ./reducedim.jl:999 [inlined]
 [12] _minimum
    @ ./reducedim.jl:999 [inlined]
 [13] #_minimum#786
    @ ./reducedim.jl:998 [inlined]
 [14] _minimum
    @ ./reducedim.jl:998 [inlined]
 [15] #minimum#784
    @ ./reducedim.jl:994 [inlined]
 [16] minimum
    @ ./reducedim.jl:994 [inlined]
 [17] min_Δz(grid::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU})
    @ Oceananigans.Grids /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Grids/rectilinear_grid.jl:465
 [18] top-level scope
    @ REPL[2]:1
 [19] top-level scope
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/initialization.jl:52

simone-silvestri commented 1 year ago

min_Δz does not exist anymore, all definitions of that function are vestigial code. The new function to call is minimum_zspacing

tomchor commented 1 year ago

Okay, thanks for the heads-up. I wasn't aware of that. We should definitely remove the min_Δz() functions from the code then because they're still there.

tomchor commented 1 year ago

Actually I'll reopen this since I'm still getting an error, albeit a different one:

julia> grid
44×8×52 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [4.35026e-15, 894.427)  regularly spaced with Δx=20.3279
├── Periodic y ∈ [-8.74514e-15, 178.885) regularly spaced with Δy=22.3607
└── Bounded  z ∈ [-178.885, 178.885]     variably spaced with min(Δz)=4.76685, max(Δz)=21.2525

julia> minimum_zspacing(grid)
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:89
  [2] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:97 [inlined]
  [3] cuLaunchKernel(f::CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26
  [4] #39
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:69 [inlined]
  [5] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#39#40"{Bool, Int64, CuStream, CuFunction, CuDim3, CuDim3}, ::CUDA.KernelState, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CuDeviceArray{Float64, 4, 1}, ::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, Nothing}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}})
    @ CUDA ./none:0
  [8] #launch#38
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:62 [inlined]
  [9] #44
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ ./none:0 [inlined]
 [12] convert_arguments
    @ ./none:0 [inlined]
 [13] #cudacall#43
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:135 [inlined]
 [14] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/compiler/execution.jl:204 [inlined]
 [15] macro expansion
    @ ./none:0 [inlined]
 [16] call(::CUDA.HostKernel{typeof(CUDA.partial_mapreduce_grid), Tuple{typeof(identity), typeof(min), Nothing, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, Val{true}, CuDeviceArray{Float64, 4, 1}, KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, Nothing}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}}}, ::typeof(identity), ::typeof(min), ::Nothing, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::Val{true}, ::CuDeviceArray{Float64, 4, 1}, ::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, Nothing}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}; call_kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:threads, :blocks, :shmem), Tuple{Int64, Int64, Int64}}})
    @ CUDA ./none:0
 [17] (::CUDA.HostKernel{typeof(CUDA.partial_mapreduce_grid), Tuple{typeof(identity), typeof(min), Nothing, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, Val{true}, CuDeviceArray{Float64, 4, 1}, KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuDeviceVector{Float64, 1}}, Nothing}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}}})(::Function, ::Vararg{Any}; threads::Int64, blocks::Int64, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:shmem,), Tuple{Int64}}})
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/compiler/execution.jl:487
 [18] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/compiler/execution.jl:104 [inlined]
 [19] mapreducedim!(f::typeof(identity), op::typeof(min), R::SubArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}; init::Nothing)
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/mapreduce.jl:272
 [20] mapreducedim!(f::typeof(identity), op::typeof(min), R::SubArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}})
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/mapreduce.jl:169
 [21] mapreducedim!(f::Function, op::Function, R::SubArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}})
    @ GPUArrays /glade/work/tomasc/.julia/packages/GPUArrays/7TiO1/src/host/mapreduce.jl:10
 [22] #minimum!#808
    @ ./reducedim.jl:1018 [inlined]
 [23] minimum!(f::Function, r::Field{Nothing, Nothing, Nothing, Nothing, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, a::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}; condition::Nothing, mask::Float64, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:init,), Tuple{Bool}}})
    @ Oceananigans.Fields /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Fields/field.jl:619
 [24] minimum(f::Function, c::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Fields/field.jl:649
 [25] minimum
    @ /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Fields/field.jl:638 [inlined]
 [26] #minimum#45
    @ /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Fields/field.jl:658 [inlined]
 [27] minimum(c::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}})
    @ Oceananigans.Fields /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Fields/field.jl:658
 [28] minimum_spacing(dir::Symbol, grid::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, ℓx::Center, ℓy::Center, ℓz::Center)
    @ Oceananigans.Grids /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Grids/grid_utils.jl:412
 [29] minimum_zspacing(grid::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU})
    @ Oceananigans.Grids /glade/work/tomasc/.julia/packages/Oceananigans/KTw3g/src/Grids/grid_utils.jl:472
 [30] top-level scope
    @ REPL[4]:1
 [31] top-level scope
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/src/initialization.jl:52

simone-silvestri commented 1 year ago

oh wow, what's the grid you're using?

tomchor commented 1 year ago

oh wow, what's the grid you're using?

I edited the snippet above to include the grid. Apparently this also breaks diffusive_timescale() (unsurprisingly).

tomchor commented 1 year ago

It appears to be related to the size of the grid:

julia> grid = RectilinearGrid(GPU(), size=(8, 8, 8), x=(0,1), y=(0,1), z=0:8)
8×8×8 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0) regularly spaced with Δx=0.125
├── Periodic y ∈ [0.0, 1.0) regularly spaced with Δy=0.125
└── Bounded  z ∈ [0.0, 8.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0

julia> minimum_zspacing(grid)
1.0

julia> grid = RectilinearGrid(GPU(), size=(16, 16, 16), x=(0,1), y=(0,1), z=0:16)
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [0.0, 16.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0

julia> minimum_zspacing(grid)
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:89
  [2] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:97 [inlined]
  [3] cuLaunchKernel(f::CUDA.CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CUDA.CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
    @ CUDA /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26
  [4] #39
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:69 [inlined]
  [5] macro expansion
    @ /glade/work/tomasc/.julia/packages/CUDA/BbliS/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#39#40"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3}, ::CUDA.KernelState, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CUDA.CuDeviceArray{Float64, 4, 1}, ::KernelFunctionOperation{Center, Center, Center, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64, typeof(Oceananigans.Grids.zspacing), Tuple{Center, Center, Center}})

tomchor commented 1 year ago

Also this works:

julia> import Base: minimum

julia> using OffsetArrays

julia> minimum(a::SubArray{<:Any, <:Any, <:OffsetArrays.OffsetVector}) = minimum(parent(parent(a)))
minimum (generic function with 21 methods)

julia> minimum(zspacings(grid, Center()))
1.0

julia> grid
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [0.0, 16.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0

so it might have something to do with the KernelFunctionOperation that's used here?: https://github.com/CliMA/Oceananigans.jl/blob/bcc34f07b3f949ea6fb34c7814f4b856d24924c2/src/Grids/grid_utils.jl#L407-L413

navidcy commented 1 year ago

Can't reproduce the problem. On tartarus with Julia v1.8:

julia> grid = RectilinearGrid(GPU(), size=(16, 16, 16), x=(0,1), y=(0,1), z=0:16)
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [0.0, 16.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0

julia> minimum_zspacing(grid)
1.0

navidcy commented 1 year ago

Was it some random one-off stochastic error?

tomchor commented 1 year ago

Was it some random one-off stochastic error?

Turns out this happens on a Quadro GP100 GPU, but not on a Tesla V100. I'm using Julia 1.8.3 on NCAR's Casper server using the (current as of this message) main branch.

Not sure what to do about this. Does it have to do with the Compute Capability? @navidcy which GPUs does tartarus have again?

glwagner commented 1 year ago

Also this works:

julia> import Base: minimum

julia> using OffsetArrays

julia> minimum(a::SubArray{<:Any, <:Any, <:OffsetArrays.OffsetVector}) = minimum(parent(parent(a)))
minimum (generic function with 21 methods)

julia> minimum(zspacings(grid, Center()))
1.0

julia> grid
16×16×16 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0625
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.0625
└── Bounded  z ∈ [0.0, 16.0] variably spaced with min(Δz)=1.0, max(Δz)=1.0

so it might have something to do with the KernelFunctionOperation that's used here?:

https://github.com/CliMA/Oceananigans.jl/blob/bcc34f07b3f949ea6fb34c7814f4b856d24924c2/src/Grids/grid_utils.jl#L407-L413

We use the KernelFunctionOperation because this generalizes to any grid, including immersed boundary grids that may modify the spacings of the underlying grid (for example with PartialCellBottom).

It's possible to implement shortcuts like calculating a minimum directly over a view, but I'd hesitate to put this in the source code. Maybe we can make some improvements to KernelFunctionOperation that allow it to be reduced more easily. That would benefit everyone.

tomchor commented 1 year ago

We use the KernelFunctionOperation because this generalizes to any grid, including immersed boundary grids that may modify the spacings of the underlying grid (for example with PartialCellBottom).

It's possible to implement shortcuts like calculating a minimum directly over a view, but I'd hesitate to put this in the source code. Maybe we can make some improvements to KernelFunctionOperation that allow it to be reduced more easily. That would benefit everyone.

Agreed. I was just posting that as it narrows don the problem a bit more. That said, I don't really know how to further investigate/tackle this issue since my GPU knowledge is limited. @glwagner do you have any suggestions that I can try?

glwagner commented 1 year ago

I don't have any immediate ideas. We'd have to take a deep dive I think. I think we'd have to apply an understanding of the Julia compiler and how code gets onto the GPU than just GPU specific knowledge. I think this error has been seen before in other contexts out on the internet so searching there might be a good place to start to start gainin more knowledge

navidcy commented 1 year ago

So does this issue only come up with a particular graphics card? Perhaps we put it in discussion then if there is no action to be taken by Oceananigans side?

tomchor commented 1 year ago

Just leaving this for the record, the KA and CUDA updates didn't fix this unfortunately...

ali-ramadhan commented 2 months ago

@tomchor Are you still encountering this issue?

I was able to run your MWE with no errors on small and large grids in three different environments with Julia 1.10.4, CUDA.jl v5.4.3, and Oceananigans.jl v0.91.8:

Linux + A100 (compute capability 8.0)
Windows + RTX 3080 (compute capability 8.6)
Windows + GTX 1050 (compute capability 6.1)

I was gonna test on some compute capability 3.0 Quadro GPUs but they're too old for CUDA.jl lol.

For comparison, Tartarus has Titan V GPUs (compute capability 7.0) and your Quadro GP100 is compute capability 6.0. So if it's solely tied to compute capability then it could just be an issue with <=6.0?

I found this list of CUDA compute capabilities useful: https://developer.nvidia.com/cuda-gpus

MWE:

julia> using Oceananigans

julia> grid = RectilinearGrid(GPU(), size=(44, 8, 52), extent=(1, 1, 1))
44×8×52 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 3×3×3 halo
├── Periodic x ∈ [0.0, 1.0)  regularly spaced with Δx=0.0227273
├── Periodic y ∈ [0.0, 1.0)  regularly spaced with Δy=0.125
└── Bounded  z ∈ [-1.0, 0.0] regularly spaced with Δz=0.0192308

julia> minimum_zspacing(grid)
0.019230769230769232

julia> minimum_xspacing(grid)
0.022727272727272728

julia> minimum_yspacing(grid)
0.125

tomchor commented 2 months ago

I haven't had time to investigate further, but I haven't gotten a similar error in a long time, so I'm assuming some update fixed this. Since it sounds like you didn't find anything, I think I'll close this now.

Thanks!

CliMA / Oceananigans.jl

`minimum_zspacing` isn't working on the GPU #3040