Open ali-ramadhan opened 2 hours ago
Running with --check-bounds=yes
on the CPU provides a strong hint:
at index [39914881, -59303136, 54]
Yeah that'll do it lol.
[ Info: Iteration 1...
[ Info: Iteration 2...
ERROR: LoadError: BoundsError: attempt to access 109×208×68 OffsetArray(::Array{Float64, 3}, -3:105, -3:204, -3:64) with eltype Float64 with indices -3:105×-3:204×-3:64 at index [39914881, -59303136, 54]
Stacktrace:
[1] throw_boundserror(A::OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, I::Tuple{Int64, Int64, Int64})
@ Base ./abstractarray.jl:737
[2] checkbounds
@ ./abstractarray.jl:702 [inlined]
[3] getindex
@ ~/.julia/packages/OffsetArrays/hwmnB/src/OffsetArrays.jl:422 [inlined]
[4] getindex
@ ~/atdepth/Oceananigans.jl/src/Fields/field.jl:401 [inlined]
[5] _interpolate
@ ~/atdepth/Oceananigans.jl/src/Fields/interpolate.jl:295 [inlined]
[6] interpolate
@ ~/atdepth/Oceananigans.jl/src/Fields/interpolate.jl:245 [inlined]
[7] advect_particle
@ ~/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/lagrangian_particle_advection.jl:113 [inlined]
[8] macro expansion
@ ~/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/lagrangian_particle_advection.jl:177 [inlined]
[9] cpu__advect_particles!
@ ~/.julia/packages/KernelAbstractions/491pi/src/macros.jl:291 [inlined]
[10] cpu__advect_particles!(__ctx__::KernelAbstractions.CompilerMetadata{…}, particles::StructArrays.StructVector{…}, restitution::Float64, grid::LatitudeLongitudeGrid{…}, Δt::Float64, velocities::@NamedTuple{…})
@ Oceananigans.Models.LagrangianParticleTracking ./none:0
[11] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{…}, ndrange::Nothing, iterspace::KernelAbstractions.NDIteration.NDRange{…}, args::Tuple{…}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/491pi/src/cpu.jl:144
[12] __run(obj::KernelAbstractions.Kernel{…}, ndrange::Nothing, iterspace::KernelAbstractions.NDIteration.NDRange{…}, args::Tuple{…}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck, static_threads::Bool)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/491pi/src/cpu.jl:111
[13] (::KernelAbstractions.Kernel{…})(::StructArrays.StructVector{…}, ::Vararg{…}; ndrange::Nothing, workgroupsize::Nothing)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/491pi/src/cpu.jl:46
[14] (::KernelAbstractions.Kernel{…})(::StructArrays.StructVector{…}, ::Vararg{…})
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/491pi/src/cpu.jl:39
[15] advect_lagrangian_particles!(particles::LagrangianParticles{…}, model::HydrostaticFreeSurfaceModel{…}, Δt::Float64)
@ Oceananigans.Models.LagrangianParticleTracking ~/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/lagrangian_particle_advection.jl:193
[16] step_lagrangian_particles!
@ ~/atdepth/Oceananigans.jl/src/Models/LagrangianParticleTracking/LagrangianParticleTracking.jl:143 [inlined]
[17] step_lagrangian_particles!
@ ~/atdepth/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/HydrostaticFreeSurfaceModels.jl:107 [inlined]
[18] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64; callbacks::Vector{…}, euler::Bool)
@ Oceananigans.TimeSteppers ~/atdepth/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:124
[19] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64)
@ Oceananigans.TimeSteppers ~/atdepth/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:76
[20] top-level scope
@ ~/atdepth/Oceananigans.jl/particles_error.jl:37
[21] include(fname::String)
@ Base.MainInclude ./client.jl:489
[22] top-level scope
@ REPL[1]:1
in expression starting at /home/alir/atdepth/Oceananigans.jl/particles_error.jl:35
Some type information was truncated. Use `show(err)` to see complete types.
~I have had errors like this and found it came from the particle leaving the domain and then the boundary is enforcer moving it back in but it had moved so far out when it went back "in" it was out of the domain the other way~
I just realised there is no velocity in the MWE so this can't be what's going on.
I think these lines should be using ξnode
, ηnode
, and rnode
:
I'll open a PR with a fix tomorrow. Should probably also add a test for particle advection on a lat-lon grid.
Some debug printing inside advect_particle
with 1 particle:
[ Info: Iteration 1...
[ Info: X=(1.0, -1.5, -10.0), I=(47, 109, 53)
[ Info: (before) X⁺=(1.0, -1.5, -10.0)
(iᴿ, jᴿ, kᴿ) = (101, 201, 61)
(xᴸ, yᴸ, zᴸ) = (87813.63270401207, -217942.05622333512, -100.0)
(xᴿ, yᴿ, zᴿ) = (136722.49142523398, -124538.3178419058, 0.0)
(x⁺, y⁺, z⁺) = (175626.26540802413, -249075.1356838116, -10.0)
[ Info: (after) X⁺=(175626.26540802413, -249075.1356838116, -10.0)
[ Info: Iteration 2...
[ Info: X=(175626.26540802413, -249075.1356838116, -10.0), I=(39914880, -59303137, 53)
ERROR: LoadError: BoundsError: attempt to access 109×208×68 OffsetArray(::Array{Float64, 3}, -3:105, -3:204, -3:64) with eltype Float64 with indices -3:105×-3:204×-3:64 at index [39914881, -59303136, 54]
I'm trying to add some particles to a hydrostatic model on a lat-lon grid, but ran into some CUDA memory issues. After reducing down to a MWE I noticed that it also segfaults on the CPU.
The MWE seems to be sensitive to the exact grid. Some lat-lon ranges lead to illegal memory accesses and others do not. I could not find a pattern though.
On the CPU the segfault seems to occur after ~2 iterations. On the GPU after ~29 iterations.
The particles are initialized within the domain and without any dynamics the particles should stay perfectly still. So I'm not sure where the illegal memory access is happening, but should be easy to debug on the CPU?
MWE:
CPU segfault:
GPU illegal memory access: