CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
982 stars 193 forks source link

Error for `maximum(::Field{Face, Center, Center})` on GPU #2744

Closed navidcy closed 1 year ago

navidcy commented 2 years ago

I get the following error when I call maximum on a field with location Face, Center, Center on an immersed boundary grid with a vertically stretched underlying grid on the GPU.

For example:

using Oceananigans
using Oceananigans.ImmersedBoundaries: ImmersedBoundaryGrid, GridFittedBottom

Lx, Ly, Lz = 500, 600, 3

Nx, Ny, Nz = 128, 128, 64

σ = 1.04 # linear stretching factor
linearly_spaced_faces(k) = - Lz * (1 - σ^(1 - k + Nz)) / (1 - σ^Nz)

underlying_grid = RectilinearGrid(GPU(),
                                  topology = (Periodic, Bounded, Bounded), 
                                  size = (Nx, Ny, Nz),
                                  x = (-Lx/2, Lx/2),
                                  y = (-Ly/2, Ly/2),
                                  z = linearly_spaced_faces,
                                  halo = (4, 4, 4))

const H_deep = H = underlying_grid.Lz
const H_shelf = h = 0.5
const width_shelf = 100

shelf(x, y) = -(H + h)/2 - (H - h)/2 * tanh(y / width_shelf)
bathymetry(x, y) = shelf(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bathymetry))

then just creating a u field gives:

julia> u = Field((Face, Center, Center), grid)
Error showing value of type Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}:
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/error.jl:89
  [2] macro expansion
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/error.jl:97 [inlined]
  [3] cuLaunchKernel(f::CUDA.CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CUDA.CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
    @ CUDA /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
  [4] #39
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:69 [inlined]
  [5] macro expansion
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#39#40"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3}, ::CUDA.KernelState, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CUDA.CuDeviceArray{Float64, 4, 1}, ::Oceananigans.AbstractOperations.ConditionalOperation{Face, Center, Center, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuDeviceMatrix{Float64, 1}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, Nothing}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ CUDA ./none:0
  [8] #launch#38
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:62 [inlined]
  [9] #44
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ ./none:0 [inlined]
 [12] convert_arguments
    @ ./none:0 [inlined]
 [13] #cudacall#43
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:135 [inlined]
 [14] macro expansion
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:204 [inlined]
 [15] macro expansion
    @ ./none:0 [inlined]
 [16] #call#205
    @ ./none:0 [inlined]
 [17] #_#226
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:484 [inlined]
 [18] macro expansion
    @ /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:104 [inlined]
 [19] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Face, Center, Center, Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}; init::Nothing)
    @ CUDA /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/src/mapreduce.jl:272
 [20] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Face, Center, Center, Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ CUDA /g/data/v45/nc3020/.julia/packages/CUDA/DfvRa/src/mapreduce.jl:172
 [21] mapreducedim!(f::Function, op::Function, R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Face, Center, Center, Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ GPUArrays /g/data/v45/nc3020/.julia/packages/GPUArrays/fqD8z/src/host/mapreduce.jl:10
 [22] #maximum!#713
    @ ./reducedim.jl:895 [inlined]
 [23] maximum!(f::Function, r::Field{Nothing, Nothing, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, a::Oceananigans.AbstractOperations.ConditionalOperation{Face, Center, Center, Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}; condition::Nothing, mask::Float64, kwargs::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:init,), Tuple{Bool}}})
    @ Oceananigans.Fields /g/data/v45/nc3020/Oceananigans.jl/src/Fields/field.jl:581
 [24] maximum(f::Function, c::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields /g/data/v45/nc3020/Oceananigans.jl/src/Fields/field.jl:611
 [25] maximum
    @ /g/data/v45/nc3020/Oceananigans.jl/src/Fields/field.jl:606 [inlined]
 [26] #maximum#36
    @ /g/data/v45/nc3020/Oceananigans.jl/src/Fields/field.jl:620 [inlined]
 [27] maximum
    @ /g/data/v45/nc3020/Oceananigans.jl/src/Fields/field.jl:620 [inlined]
 [28] data_summary(field::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields /g/data/v45/nc3020/Oceananigans.jl/src/Fields/show_fields.jl:27
 [29] show(io::IOContext{Base.TTY}, field::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields /g/data/v45/nc3020/Oceananigans.jl/src/Fields/show_fields.jl:48
 [30] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, f::Field{Face, Center, Center, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields /g/data/v45/nc3020/Oceananigans.jl/src/Fields/show_fields.jl:59
 [31] (::REPL.var"#38#39"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:220
 [32] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:462
 [33] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:213
 [34] display(d::REPL.REPLDisplay, x::Any)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:225
 [35] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [36] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [37] invokelatest
    @ ./essentials.jl:706 [inlined]
 [38] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:247
 [39] (::REPL.var"#40#41"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:231
 [40] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:462
 [41] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:229
 [42] (::REPL.var"#do_respond#61"{Bool, Bool, REPL.var"#72#82"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:798
 [43] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [44] invokelatest
    @ ./essentials.jl:706 [inlined]
 [45] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
 [46] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL /g/data/v45/nc3020/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
 [47] (::REPL.var"#44#49"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:417

The error comes from show() and in particular from maximum that's being called in data_summary. Similarly for a field on (Center, Face, Center)! However, not on a `(Center, Center, Face):

julia> w = Field((Center, Center, Face), grid)
128×128×65 Field{Center, Center, Face} on ImmersedBoundaryGrid on GPU
├── grid: 128×128×64 ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: Nothing, top: Nothing, immersed: ZeroFlux
└── data: 136×136×73 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:69) with eltype Float64 with indices -3:132×-3:132×-3:69
    └── max=0.0, min=0.0, mean=0.0

julia> maximum(abs, w)
0.0

Also, no problems when using a regularly spaced grid --- things look OK then!

using Oceananigans
using Oceananigans.ImmersedBoundaries: ImmersedBoundaryGrid, GridFittedBottom

Lx, Ly, Lz = 500, 600, 3

Nx, Ny, Nz = 128, 128, 64

underlying_grid = RectilinearGrid(GPU(),
                                  topology = (Periodic, Bounded, Bounded), 
                                  size = (Nx, Ny, Nz),
                                  x = (-Lx/2, Lx/2),
                                  y = (-Ly/2, Ly/2),
                                  z = (-Lz, 0),
                                  halo = (4, 4, 4))

const H_deep = H = underlying_grid.Lz
const H_shelf = h = 0.5
const width_shelf = 100

shelf(x, y) = -(H + h)/2 - (H - h)/2 * tanh(y / width_shelf)
bathymetry(x, y) = shelf(x, y)

grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bathymetry))

gives

julia> u = Field((Face, Center, Center), grid)
128×128×64 Field{Face, Center, Center} on ImmersedBoundaryGrid on GPU
├── grid: 128×128×64 ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 136×136×72 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:68) with eltype Float64 with indices -3:132×-3:132×-3:68
    └── max=0.0, min=0.0, mean=0.0

julia> maximum(abs, u)
0.0
simone-silvestri commented 2 years ago

Hmmm, interesting. I found the same problem on a 1/12 degree global ocean when using several maximum in a progress function (I also have a stretched immersed grid).

Looking at the error message, and the size of my simulation I thought it was just a problem with the size of the field I was trying to reduce that was exceeding the available cuda threads, so I haven't given it a second thought.

This because in general that error means (handling CUDA error messages) Too Many Resources Requested for Launch - This error means that the number of registers available on the multiprocessor is being exceeded. Reduce the number of threads per block to solve the problem.

But if you get the same error for that small grid, it must mean that it is not a problem with the size of the field, but we are doing something funky with stretched reductions on immersed boundaries.

navidcy commented 2 years ago

I also discovered this in a progress function. But yeah, 128x128x64 is small enough, right? Also the fact that ZFacedFields are ok might be key?

simone-silvestri commented 2 years ago

The kernel launching is handled by mapreducedim, but it definitely should be ok. I guess if it is not a problem reducing the interior then it's our code's problem

we should try following the functions and spit out gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ from

[3] cuLaunchKernel(f::CUDA.CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CUDA.CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
simone-silvestri commented 2 years ago

What about on a non-immersed rectilinear spaced grid? The reduction is a little different in that case

simone-silvestri commented 2 years ago

hmmm I cannot reproduce the bug

julia> underlying_grid = RectilinearGrid(GPU(),
                                         topology = (Periodic, Bounded, Bounded), 
                                         size = (Nx, Ny, Nz),
                                         x = (-Lx/2, Lx/2),
                                         y = (-Ly/2, Ly/2),
                                         z = linearly_spaced_faces,
                                         halo = (4, 4, 4))
128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── Periodic x ∈ [-250.0, 250.0) regularly spaced with Δx=3.90625
├── Bounded  y ∈ [-300.0, 300.0] regularly spaced with Δy=4.6875
└── Bounded  z ∈ [-3.0, 0.0]     variably spaced with min(Δz)=0.0106134, max(Δz)=0.12559

julia> 

julia> 

julia> const H_deep = H = underlying_grid.Lz
2.9999999999999996

julia> const H_shelf = h = 0.5
0.5

julia> const width_shelf = 100
100

julia> 

julia> shelf(x, y) = -(H + h)/2 - (H - h)/2 * tanh(y / width_shelf)
shelf (generic function with 1 method)

julia> bathymetry(x, y) = shelf(x, y)
bathymetry (generic function with 1 method)

julia> 

julia> grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bathymetry))
128×128×64 ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo:
├── immersed_boundary: GridFittedBottom(min(h)=-2.99e+00, max(h)=0.00e+00)
├── underlying_grid: 128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── Periodic x ∈ [-250.0, 250.0) regularly spaced with Δx=3.90625
├── Bounded  y ∈ [-300.0, 300.0] regularly spaced with Δy=4.6875
└── Bounded  z ∈ [-3.0, 0.0]     variably spaced with min(Δz)=0.0106134, max(Δz)=0.12559

julia> u = XFaceField(grid)
128×128×64 Field{Face, Center, Center} on ImmersedBoundaryGrid on GPU
├── grid: 128×128×64 ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 136×136×72 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:68) with eltype Float64 with indices -3:132×-3:132×-3:68
    └── max=0.0, min=0.0, mean=0.0

julia> u = Field((Face, Center, Center), grid)
128×128×64 Field{Face, Center, Center} on ImmersedBoundaryGrid on GPU
├── grid: 128×128×64 ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 136×136×72 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:68) with eltype Float64 with indices -3:132×-3:132×-3:68
    └── max=0.0, min=0.0, mean=0.0

I am using

(Oceananigans) pkg> st
Project Oceananigans v0.77.4
Status `~/stable_oceananigans/Oceananigans.jl/Project.toml`
  [c963dde9] AMGX v0.1.2
  [79e6a3ab] Adapt v3.4.0
  [2169fc97] AlgebraicMultigrid v0.5.1
  [052768ef] CUDA v3.12.0
⌃ [72cfdca4] CUDAKernels v0.3.3
  [a8cc5b0e] Crayons v4.1.1
  [7445602f] CubedSphere v0.2.0
⌅ [ffbed154] DocStringExtensions v0.8.6
  [7a1cc6ca] FFTW v1.5.0
  [c27321d9] Glob v1.3.0
  [40713840] IncompleteLU v0.2.0
  [42fd0dbc] IterativeSolvers v0.9.2
⌃ [033835bb] JLD2 v0.4.22
⌅ [63c18a36] KernelAbstractions v0.7.2
  [da04e1cc] MPI v0.19.2
  [85f8d34a] NCDatasets v0.12.7
  [6fe1bfb0] OffsetArrays v1.12.7
  [bac558e1] OrderedCollections v1.4.1
⌃ [0e08944d] PencilArrays v0.17.6
  [4a48f351] PencilFFTs v0.14.1
⌃ [6038ab10] Rotations v1.3.1
  [d496a93d] SeawaterPolynomials v0.2.3
  [09ab397b] StructArrays v0.6.12
  [bc48ee85] Tullio v0.3.4
  [ade2ca70] Dates
  [b77e0a4c] InteractiveUtils
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [44cfe95a] Pkg
  [de0858da] Printf
  [9a3f8284] Random
  [2f01184e] SparseArrays
  [10745b16] Statistics
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ cannot be upgraded. To see why use `status --outdated`
simone-silvestri commented 2 years ago

but I get it with a Nothing field in the z direction weirdly

julia> u = Field((Center, Center, Nothing), grid)
Error showing value of type Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}:
ERROR: CUDA error: too many resources requested for launch (code 701, ERROR_LAUNCH_OUT_OF_RESOURCES)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/error.jl:89
  [2] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/error.jl:97 [inlined]
  [3] cuLaunchKernel(f::CUDA.CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CUDA.CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
  [4] #39
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:69 [inlined]
  [5] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#39#40"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3}, ::CUDA.KernelState, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CUDA.CuDeviceArray{Float64, 4, 1}, ::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, Nothing, Nothing, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, Nothing, Nothing, Nothing}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuDeviceMatrix{Float64, 1}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, Nothing}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ CUDA ./none:0
  [8] #launch#38
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:62 [inlined]
  [9] #44
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:136 [inlined]
 [10] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:95 [inlined]
 [11] macro expansion
    @ ./none:0 [inlined]
 [12] convert_arguments
    @ ./none:0 [inlined]
 [13] #cudacall#43
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/execution.jl:135 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:204 [inlined]
 [15] macro expansion
    @ ./none:0 [inlined]
 [16] call(::CUDA.HostKernel{typeof(CUDA.partial_mapreduce_grid), Tuple{typeof(identity), typeof(max), Nothing, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, Val{true}, CUDA.CuDeviceArray{Float64, 4, 1}, Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, Nothing, Nothing, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, Nothing, Nothing, Nothing}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuDeviceMatrix{Float64, 1}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, Nothing}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}}}, ::typeof(identity), ::typeof(max), ::Nothing, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, ::Val{true}, ::CUDA.CuDeviceArray{Float64, 4, 1}, ::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, Nothing, Nothing, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, Nothing, Nothing, Nothing}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuDeviceMatrix{Float64, 1}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, Nothing}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}; call_kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:threads, :blocks, :shmem), Tuple{Int64, Int64, Int64}}})
    @ CUDA ./none:0
 [17] (::CUDA.HostKernel{typeof(CUDA.partial_mapreduce_grid), Tuple{typeof(identity), typeof(max), Nothing, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, Val{true}, CUDA.CuDeviceArray{Float64, 4, 1}, Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, Nothing, Nothing, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, Nothing, Nothing, Nothing}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuDeviceMatrix{Float64, 1}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, Nothing}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}}})(::Function, ::Vararg{Any}; threads::Int64, blocks::Int64, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:shmem,), Tuple{Int64}}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:484
 [18] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:104 [inlined]
 [19] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}; init::Nothing)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/mapreduce.jl:272
 [20] mapreducedim!(f::typeof(identity), op::typeof(max), R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/mapreduce.jl:169
 [21] mapreducedim!(f::Function, op::Function, R::SubArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, A::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64})
    @ GPUArrays ~/.julia/packages/GPUArrays/fqD8z/src/host/mapreduce.jl:10
 [22] #maximum!#803
    @ ./reducedim.jl:1018 [inlined]
 [23] maximum!(f::Function, r::Field{Nothing, Nothing, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, a::Oceananigans.AbstractOperations.ConditionalOperation{Center, Center, Nothing, Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}, typeof(identity), ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Oceananigans.ImmersedBoundaries.NotImmersed{typeof(Oceananigans.AbstractOperations.truefunc)}, Float64, Float64}; condition::Nothing, mask::Float64, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:init,), Tuple{Bool}}})
    @ Oceananigans.Fields ~/stable_oceananigans/Oceananigans.jl/src/Fields/field.jl:581
 [24] maximum(f::Function, c::Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}}; condition::Nothing, mask::Float64, dims::Function)
    @ Oceananigans.Fields ~/stable_oceananigans/Oceananigans.jl/src/Fields/field.jl:611
 [25] maximum
    @ ~/stable_oceananigans/Oceananigans.jl/src/Fields/field.jl:600 [inlined]
 [26] #maximum#36
    @ ~/stable_oceananigans/Oceananigans.jl/src/Fields/field.jl:620 [inlined]
 [27] maximum
    @ ~/stable_oceananigans/Oceananigans.jl/src/Fields/field.jl:620 [inlined]
 [28] data_summary(field::Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields ~/stable_oceananigans/Oceananigans.jl/src/Fields/show_fields.jl:27
 [29] show(io::IOContext{Base.TTY}, field::Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields ~/stable_oceananigans/Oceananigans.jl/src/Fields/show_fields.jl:48
 [30] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, f::Field{Center, Center, Nothing, Nothing, ImmersedBoundaryGrid{Float64, Periodic, Bounded, Bounded, RectilinearGrid{Float64, Periodic, Bounded, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, GPU}, GridFittedBottom{OffsetArrays.OffsetMatrix{Float64, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}}, Oceananigans.ImmersedBoundaries.CenterImmersedCondition}, GPU}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, Nothing, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing, Oceananigans.Fields.FieldBoundaryBuffers{Nothing, Nothing, Nothing, Nothing}})
    @ Oceananigans.Fields ~/stable_oceananigans/Oceananigans.jl/src/Fields/show_fields.jl:59
 [31] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
 [32] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [33] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
 [34] display(d::REPL.REPLDisplay, x::Any)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
 [35] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [36] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [37] invokelatest
    @ ./essentials.jl:726 [inlined]
 [38] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:296
 [39] (::REPL.var"#45#46"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:278
 [40] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [41] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:276
 [42] (::REPL.var"#do_respond#66"{Bool, Bool, REPL.var"#77#87"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:857
 [43] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [44] invokelatest
    @ ./essentials.jl:726 [inlined]
 [45] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
 [46] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
 [47] (::REPL.var"#49#54"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:484
navidcy commented 2 years ago

I was using julia 1.6 I think

navidcy commented 2 years ago

What about on a non-immersed rectilinear spaced grid? The reduction is a little different in that case

That seems OK:

julia> using Oceananigans
[ Info: Precompiling Oceananigans [9e8cae18-63c1-5223-a75c-80ca9d6e9a09]
WARNING: Method definition next_stream() in module CUDAKernels at /g/data/v45/nc3020/.julia/packages/CUDAKernels/kCOA4/src/CUDAKernels.jl:33 overwritten in module Architectures at /g/data/v45/nc3020/Oceananigans.jl/src/Architectures.jl:23.
  ** incremental compilation may be fatally broken for this module **

[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided
┌ Warning: You appear to be using MPI.jl with the default MPI binary on a cluster.
│ We recommend using the system-provided MPI, see the Configuration section of the MPI.jl docs.
└ @ MPI /g/data/v45/nc3020/.julia/packages/MPI/08SPr/deps/deps.jl:15
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided
┌ Warning: You appear to be using MPI.jl with the default MPI binary on a cluster.
│ We recommend using the system-provided MPI, see the Configuration section of the MPI.jl docs.
└ @ MPI /g/data/v45/nc3020/.julia/packages/MPI/08SPr/deps/deps.jl:15
[ Info: Oceananigans will use 48 threads

julia> using Oceananigans.ImmersedBoundaries: ImmersedBoundaryGrid, GridFittedBottom

julia> Lx, Ly, Lz = 500, 600, 3
(500, 600, 3)

julia> Nx, Ny, Nz = 128, 128, 64
(128, 128, 64)

julia> σ = 1.04 # linear stretching factor
1.04

julia> linearly_spaced_faces(k) = - Lz * (1 - σ^(1 - k + Nz)) / (1 - σ^Nz)
linearly_spaced_faces (generic function with 1 method)

julia> rectilinear_grid = RectilinearGrid(GPU(),
                                         topology = (Periodic, Bounded, Bounded),
                                         size = (Nx, Ny, Nz),
                                         x = (-Lx/2, Lx/2),
                                         y = (-Ly/2, Ly/2),
                                         z = linearly_spaced_faces,
                                         halo = (4, 4, 4))
128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── Periodic x ∈ [-250.0, 250.0) regularly spaced with Δx=3.90625
├── Bounded  y ∈ [-300.0, 300.0] regularly spaced with Δy=4.6875
└── Bounded  z ∈ [-3.0, 0.0]     variably spaced with min(Δz)=0.0106134, max(Δz)=0.12559

julia> u = Field{Face, Center, Center}(rectilinear_grid)
128×128×64 Field{Face, Center, Center} on RectilinearGrid on GPU
├── grid: 128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 136×136×72 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:68) with eltype Float64 with indices -3:132×-3:132×-3:68
    └── max=0.0, min=0.0, mean=0.0

julia> maximum(abs, u)
0.0

julia> v = Field{Center, Face, Center}(rectilinear_grid)
128×129×64 Field{Center, Face, Center} on RectilinearGrid on GPU
├── grid: 128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 136×137×72 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:133, -3:68) with eltype Float64 with indices -3:132×-3:133×-3:68
    └── max=0.0, min=0.0, mean=0.0

julia> maximum(abs, v)
0.0

julia> w = Field{Center, Center, Face}(rectilinear_grid)
128×128×65 Field{Center, Center, Face} on RectilinearGrid on GPU
├── grid: 128×128×64 RectilinearGrid{Float64, Periodic, Bounded, Bounded} on GPU with 4×4×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: ZeroFlux, north: ZeroFlux, bottom: Nothing, top: Nothing, immersed: ZeroFlux
└── data: 136×136×73 OffsetArray(::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, -3:132, -3:132, -3:69) with eltype Float64 with indices -3:132×-3:132×-3:69
    └── max=0.0, min=0.0, mean=0.0

julia> maximum(abs, w)
0.0
tomchor commented 2 years ago

I can confirm that I can reproduce the error using the MWE here with CUDA 3.12. As with #2756, the same snippet appears to run fine with CUDA 3.9 though.

Should we open an issue on CUDA.jl?

simone-silvestri commented 2 years ago

I am not sure about CUDA.jl, if you try maximum(abs, interior(w)) or maximum(abs, w.data) the error disappears. It must be something that lies in our implementation. We should first isolate all the conditions that lead to a code error before posting an issue.

When we do a reduction over an immersed field, we wrap the field in a ConditionalOperation, in this case with condition NotImmersed((i, j, k, grid) -> true). a conditional operation has a getindex which is defined as

@inline function Base.getindex(c::ConditionalOperation, i, j, k) 
    return ifelse(get_condition(c.condition, i, j, k, c.grid, c), 
                  c.func(getindex(c.operand, i, j, k)),
                  c.mask)
end

and in this particular case, the mask in -Inf and get_condition is defined as

@inline function get_condition(condition::NotImmersed, i, j, k, ibg, co::ConditionalOperation, args...)
    LX, LY, LZ = location(co)
    return get_condition(condition.func, i, j, k, ibg, args...) & !(immersed_peripheral_node(i, j, k, ibg, LX(), LY(), LZ()))
end 

(where condition.func always evaluates to true) In practice, what is happening is that where get_condition evaluates to false, the getindex returns -Inf instead of the value. What worries me here, is that there is an evaluation of immersed_peripheral_node. This is the only difference I can notice with a non-immersed field. The fact that the error appears only with specific locations makes me believe that the evaluation of immersed_peripheral_node might be the source of the problem.

glwagner commented 1 year ago

Is this still an issue?

tomchor commented 1 year ago

Is this still an issue?

Apparently not anymore. I just tested @navidcy 's MWE are everything seems to work fine (on a Quadro GPU and CUDA 3.12 on main):

julia> grid
150×150×12 ImmersedBoundaryGrid{Float64, Periodic, Periodic, Bounded} on GPU with 4×4×4 halo:
├── immersed_boundary: GridFittedBottom(min(h)=5.00e+01, max(h)=5.00e+01)
├── underlying_grid: 150×150×12 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on GPU with 4×4×4 halo
├── Periodic x ∈ [-200.0, 200.0) regularly spaced with Δx=2.66667
├── Periodic y ∈ [-200.0, 200.0) regularly spaced with Δy=2.66667
└── Bounded  z ∈ [0.0, 100.0]    variably spaced with min(Δz)=8.33333, max(Δz)=8.33333

julia> maximum(abs, u)
0.0

julia> maximum(abs, w)
0.0