Closed tomchor closed 1 year ago
What if you edit the forcing functions into the discrete form, e.g. invoking Forcing
with discrete_form=true
?
What if you edit the forcing functions into the discrete form, e.g. invoking
Forcing
withdiscrete_form=true
?
It helps! But doesn't solve the problem.
In particular the MWE above (with two tracer) in discrete form compiles for me. But when I add more tracers (I need at least 6 tracers for my simulations) it fails again. Sometimes with a different error:
ERROR: LoadError: Failed to compile PTX code (ptxas exited with code 255)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_60 --output-file /glade/scratch/tomasc/jl_hs9AZo7IJq.cubin /glade/scratch/tomasc/jl_XSJ4P4z47a.ptx
ptxas /glade/scratch/tomasc/jl_XSJ4P4z47a.ptx, line 5136; error : Entry function '_Z23julia_gpu_calculate_Gu_7ContextI14__CUDACtx_Namevv14__PassType_312v12DisableHooksE18_gpu_calculate_Gu_16CompilerMetadataI10StaticSizeI9_8__8__6_E12DynamicCheckvv7NDRangeILi3ES5_I9_1__1__6_ES5_I11_16__16__1_EvvEE11OffsetArrayI7Float64Li3E13CuDeviceArrayIS9_Li3ELi1EEE15RectilinearGridIS9_8PeriodicS12_7BoundedS9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1E12StepRangeLenIS9_14TwicePrecisionIS9_ES15_IS9_E5Int64EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvE4WENOILi3ES9_vv5TupleIS8_IS18_IS9_S9_S9_ELi1ES10_IS18_IS9_S9_S9_ELi1ELi1EEES8_IS18_IS9_S9_S9_ELi1ES10_IS18_IS9_S9_S9_ELi1ELi1EEES8_IS18_IS9_S9_S9_ELi1ES10_IS18_IS9_S9_S9_ELi1ELi1EEES8_IS18_IS9_S9_S9_ELi1ES10_IS18_IS9_S9_S9_ELi1ELi1EEEELitrueEvS17_ILi2ES9_vvS18_IS8_IS18_IS9_S9_ELi1ES10_IS18_IS9_S9_ELi1ELi1EEES8_IS18_IS9_S9_ELi1ES10_IS18_IS9_S9_ELi1ELi1EEES8_IS18_IS9_S9_ELi1ES10_IS18_IS9_S9_ELi1ELi1EEEELitrueEv12UpwindBiasedILi1ES9_vvvv8CenteredILi1ES9_vvvvEES20_ILi1ES9_vvvvEES20_ILi2ES9_vvvS20_ILi1ES9_vvvvEEEvv16SmagorinskyLillyI26ExplicitTimeDiscretizationS9_10NamedTupleI34__b____1____2____3____4____5____6_S18_IS9_S9_S9_S9_S9_S9_S9_EEE17BoundaryConditionI4FluxvEvS23_I23__velocities___tracers_S18_IS23_I12__u___v___w_S18_I9ZeroFieldIS16_Li3EES26_IS16_Li3EES26_IS16_Li3EEEES23_I34__b____1____2____3____4____5____6_S18_I13FunctionFieldI6CenterS28_S28_S23_I27__time___iteration___stage_S18_IS9_S16_S16_EEv5_b_bgS11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES26_IS16_Li3EES26_IS16_Li3EES26_IS16_Li3EES26_IS16_Li3EES26_IS16_Li3EES26_IS16_Li3EEEEEES23_I12__u___v___w_S18_IS8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEEEES23_I34__b____1____2____3____4____5____6_S18_IS8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEES8_IS9_Li3ES10_IS9_Li3ELi1EEEEES23_I2__S18_ES23_I10__________S18_IS8_IS9_Li3ES10_IS9_Li3ELi1EEES23_I34__b____1____2____3____4____5____6_S18_I15BinaryOperationIS28_S28_S28_2__S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_10_identity510_identity1S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_10_identity210_identity3S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_10_identity4S32_S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_S33_S34_S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_S35_S36_S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_S32_S33_S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_ES30_IS28_S28_S28_S31_S8_IS9_Li3ES10_IS9_Li3ELi1EEES9_S34_S35_S11_IS9_S12_S12_S13_S9_S9_S8_IS9_Li1ES10_IS9_Li1ELi1EEES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES14_IS9_S15_IS9_ES15_IS9_ES16_EES8_IS9_Li1ES10_IS9_Li1ELi1EEEvES9_EEEEES23_I46__u___v___w___b____1____2____3____4____5____6_S18_I15DiscreteForcingIS23_I13______u______S18_IS16_S16_S9_EE9_sponge_uES37_IS23_I13______u______S18_IS16_S16_S9_EE9_sponge_vES37_IS23_I13______u______S18_IS16_S16_S9_EE9_sponge_wES37_IS23_I13______u______S18_IS16_S16_S9_EE9_sponge_bE12_zeroforcingS42_S42_S42_S42_S42_EES8_IS9_Li3ES10_IS9_Li3ELi1EEES23_I27__time___iteration___stage_S18_IS9_S16_S16_EE' uses too much parameter space (0x1a10 bytes, 0x1100 max).
ptxas fatal : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /glade/scratch/tomasc/jl_XSJ4P4z47a.ptx
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
@ CUDA /glade/work/tomasc/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:428
[3] #224
@ /glade/work/tomasc/.julia/packages/CUDA/Ey3w2/src/compiler/execution.jl:347 [inlined]
[4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), Nothing, Nothing, KernelAbstractions.var"##PassType#312", Nothing, Cassette.DisableHooks}, typeof(Oceananigans.Models.NonhydrostaticModels.gpu_calculate_Gu!), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(8, 8, 6)}, KernelAbstractions.NDIteration.DynamicCheck, Nothing, Nothing, KernelAbstractions.NDIteration.NDRange{3, KernelAbstractions.NDIteration.StaticSize{(1, 1, 6)}, KernelAbstractions.NDIteration.StaticSize{(16, 16, 1)}, Nothing, Nothing}}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, WENO{3, Float64, Nothing, Nothing, NTuple{4, OffsetArrays.OffsetVector{Tuple{Float64, Float64, Float64}, CUDA.CuDeviceVector{Tuple{Float64, Float64, Float64}, 1}}}, true, Nothing, WENO{2, Float64, Nothing, Nothing, Tuple{OffsetArrays.OffsetVector{Tuple{Float64, Float64}, CUDA.CuDeviceVector{Tuple{Float64, Float64}, 1}}, OffsetArrays.OffsetVector{Tuple{Float64, Float64}, CUDA.CuDeviceVector{Tuple{Float64, Float64}, 1}}, OffsetArrays.OffsetVector{Tuple{Float64, Float64}, CUDA.CuDeviceVector{Tuple{Float64, Float64}, 1}}}, true, Nothing, UpwindBiased{1, Float64, Nothing, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}, Centered{2, Float64, Nothing, Nothing, Nothing, Centered{1, Float64, Nothing, Nothing, Nothing, Nothing}}}, Nothing, Nothing, SmagorinskyLilly{Oceananigans.TurbulenceClosures.ExplicitTimeDiscretization, Float64, NamedTuple{(:b, :τ1, :τ2, :τ3, :τ4, :τ5, :τ6), NTuple{7, Float64}}}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}, Nothing, NamedTuple{(:velocities, :tracers), Tuple{NamedTuple{(:u, :v, :w), Tuple{Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}}}, NamedTuple{(:b, :τ1, :τ2, :τ3, :τ4, :τ5, :τ6), Tuple{Oceananigans.Fields.FunctionField{Center, Center, Center, NamedTuple{(:time, :iteration, :stage), Tuple{Float64, Int64, Int64}}, Nothing, typeof(b_bg), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}, Oceananigans.Fields.ZeroField{Int64, 3}}}}}, NamedTuple{(:u, :v, :w), Tuple{OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}}}, NamedTuple{(:b, :τ1, :τ2, :τ3, :τ4, :τ5, :τ6), NTuple{7, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}}}, NamedTuple{(), Tuple{}}, NamedTuple{(:νₑ, :κₑ), Tuple{OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, NamedTuple{(:b, :τ1, :τ2, :τ3, :τ4, :τ5, :τ6), Tuple{Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity5), typeof(Oceananigans.Operators.identity1), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity2), typeof(Oceananigans.Operators.identity3), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity4), typeof(Oceananigans.Operators.identity5), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity1), typeof(Oceananigans.Operators.identity2), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity3), typeof(Oceananigans.Operators.identity4), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity5), typeof(Oceananigans.Operators.identity1), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}, Oceananigans.AbstractOperations.BinaryOperation{Center, Center, Center, typeof(/), OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, Float64, typeof(Oceananigans.Operators.identity2), typeof(Oceananigans.Operators.identity3), RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, OffsetArrays.OffsetVector{Float64, CUDA.CuDeviceVector{Float64, 1}}, Nothing}, Float64}}}}}, NamedTuple{(:u, :v, :w, :b, :τ1, :τ2, :τ3, :τ4, :τ5, :τ6), Tuple{Oceananigans.Forcings.DiscreteForcing{NamedTuple{(:σ, :u₀, :α), Tuple{Int64, Int64, Float64}}, typeof(sponge_u)}, Oceananigans.Forcings.DiscreteForcing{NamedTuple{(:σ, :u₀, :α), Tuple{Int64, Int64, Float64}}, typeof(sponge_v)}, Oceananigans.Forcings.DiscreteForcing{NamedTuple{(:σ, :u₀, :α), Tuple{Int64, Int64, Float64}}, typeof(sponge_w)}, Oceananigans.Forcings.DiscreteForcing{NamedTuple{(:σ, :u₀, :α), Tuple{Int64, Int64, Float64}}, typeof(sponge_b)}, typeof(Oceananigans.Forcings.zeroforcing), typeof(Oceananigans.Forcings.zeroforcing), typeof(Oceananigans.Forcings.zeroforcing), typeof(Oceananigans.Forcings.zeroforcing), typeof(Oceananigans.Forcings.zeroforcing), typeof(Oceananigans.Forcings.zeroforcing)}}, OffsetArrays.OffsetArray{Float64, 3, CUDA.CuDeviceArray{Float64, 3, 1}}, NamedTuple{(:time, :iteration, :stage), Tuple{Float64, Int64, Int64}}}}}})
It looks like the function calculate_Gu!
is passing too many parameters to the GPU. there is a limit to the number of parameters you can pass. Can you show the functions you are using?
You can try incorporating the background buoyancy field into the forcing functions (formulated using the discrete form). You may also try inserting the parameters as globals rather than using the kwarg parameters
(not sure if that will help). I'd also suggest testing whether the Smagorinsky closure affects the results of the simulation; if you can avoid using that you might be able to compile more complexity.
After that, we may have to either divide up the kernels or pursue https://github.com/JuliaGPU/CUDA.jl/issues/267
It looks like the function
calculate_Gu!
is passing too many parameters to the GPU. there is a limit to the number of parameters you can pass. Can you show the functions you are using?
Also just to clarify for @tomchor note that Entry function... uses too much parameter space (0x1a10 bytes, 0x1100 max)
in the PTX compilation error refers to parameters in the sense of https://github.com/JuliaGPU/CUDA.jl/issues/267 (not the "parameters" of Forcing
)
It looks like the function
calculate_Gu!
is passing too many parameters to the GPU. there is a limit to the number of parameters you can pass. Can you show the functions you are using?
The example below can reproduce the error pretty well for me (it's basically the same as the one above but with discrete forcing and a few more tracers). So if you're trying to debug it, this is probably the way to go.
using Oceananigans
arch = GPU()
z_faces = collect(0:1:6)
grid = RectilinearGrid(arch, size=(8, 8, 6),
x=(0, 1), y=(0, 1), z=z_faces)
@inline b_bg(x, y, z, t) = x
B_field = BackgroundField(b_bg)
@inline sponge_u(i, j, k, grid, clock, model_fields, p) = -p.σ * (model_fields.u[i,j,k] - p.α*p.u₀)
@inline sponge_v(i, j, k, grid, clock, model_fields, p) = -p.σ * (model_fields.v[i,j,k] - p.α*p.u₀)
@inline sponge_w(i, j, k, grid, clock, model_fields, p) = -p.σ * (model_fields.w[i,j,k] - p.α*p.u₀)
@inline sponge_b(i, j, k, grid, clock, model_fields, p) = -p.σ * (model_fields.b[i,j,k] - p.α*p.u₀)
Fᵤ = Forcing(sponge_u, field_dependencies = :u, parameters = (; σ=1, u₀=1, α=4e-5), discrete_form=true)
Fᵥ = Forcing(sponge_v, field_dependencies = :v, parameters = (; σ=1, u₀=1, α=4e-5), discrete_form=true)
Fw = Forcing(sponge_w, field_dependencies = :w, parameters = (; σ=1, u₀=1, α=4e-5), discrete_form=true)
Fb = Forcing(sponge_b, field_dependencies = :b, parameters = (; σ=1, u₀=1, α=4e-5), discrete_form=true)
model = NonhydrostaticModel(; grid,
advection = WENO(grid=grid, order=5),
tracers = (:b, :τ1, :τ2, :τ3), # This runs fine with one fewer tracer now
closure = SmagorinskyLilly(C=0.1),
background_fields = (b=B_field,),
forcing = (u=Fᵤ, v=Fᵥ, w=Fw, b=Fb),
)
@info model
simulation = Simulation(model, Δt=1, stop_iteration=10)
run!(simulation)
My actual production code is far too complicated to paste here, but the relevant forcings I'm using there are:
const z₀ = -100
const z₂ = -120
const z₁ = -grid.Lz
@inline function bottom_mask_cos(x, y, z)
if z₀ >= z > z₁
return 1/2 * (1 - cos( π*(z-z₀)/(z₁-z₀) ))
elseif z₁ >= z #> z₂
return 1.0
else
return 0.0
end
end
@inline sponge_u(x, y, z, t, u, p) = -bottom_mask_cos(x, y, z) * p.σ * u
@inline sponge_v(x, y, z, t, v, p) = -bottom_mask_cos(x, y, z) * p.σ * v
@inline sponge_w(x, y, z, t, w, p) = -bottom_mask_cos(x, y, z) * p.σ * w
@inline sponge_b(x, y, z, t, b, p) = -bottom_mask_cos(x, y, z) * p.σ * (b - b∞(0, 0, z, 0, p))
Fᵤ = Forcing(sponge_u, field_dependencies = :u, parameters = (; params.σ))
Fᵥ = Forcing(sponge_v, field_dependencies = :v, parameters = (; params.σ))
Fw = Forcing(sponge_w, field_dependencies = :w, parameters = (; params.σ))
Fb = Forcing(sponge_b, field_dependencies = :b, parameters = (; params.σ, params.N²∞))
The above are the forcing functions, and below is the background field. There are also parameters passed for the boundary conditions.
@inline b_bg(x, y, z, t, p) = p.M² * x
B_field = BackgroundField(b_bg, parameters = (; params.M²))
Changing things so that variables that are currently passed as parameters are set as const
ants helps (as you can see I did with z₀
, etc.). However, that also means I can't run back-to-back simulations where those parameters differ, which is something that makes my workflow way more streamlined, so I'm trying to avoid that. (Although I'll ultimately have to do that if we can't figure this error out...)
You're missing @inbounds
on the forcing function, not sure if that inflates the parameter size
You don't want to use if
statements --- try to write your code using ifelse
Also, you don't need the field_dependencies
argument with discrete_form=true
(but I think that has no effect, so just a side comment, maybe we should throw an error for that)
Also, you don't need the
field_dependencies
argument withdiscrete_form=true
(but I think that has no effect, so just a side comment, maybe we should throw an error for that)
Ah yeah, I forgot to change that when I adapted it to discrete form. Thanks for catching that and for the other tips!
You can try incorporating the background buoyancy field into the forcing functions (formulated using the discrete form). You may also try inserting the parameters as globals rather than using the kwarg
parameters
(not sure if that will help). I'd also suggest testing whether the Smagorinsky closure affects the results of the simulation; if you can avoid using that you might be able to compile more complexity.
@glwagner thanks for all these tips. I've tried them all (including using the discrete form) and the only thing that allows me to achieve the number of tracers I need is using closure=nothing
. However I don't think that's an option for me since I will probably need the physical (KE) dissipation at some point in the research, which doesn't exist without a closure.
Also I think closure=nothing
prevents me from using flux boundary conditions, no?
What would you recommend as the next step?
You can use FluxBoundaryCondition
with closure=nothing
(it's Value
and Gradient
that won't work). You can obtain global dissipation by differencing globally integrated TKE (and perhaps a pointwise dissipation by evaluating the TKE budget), but I agree that it's probably more difficult. @simone-silvestri may have some tips as he has been developing an implicit LES scheme for mesoscale turbulence.
Does AnisotropicMinimumDissipation
work? Or other closures?
I'm wondering if the problem is the use of a BinaryOperation
for the diffusivities with SmagorinskyLilly
:
It might be possible to avoid using that BinaryOperation
by instead extending the three diffusivity getter functions:
You can use
FluxBoundaryCondition
withclosure=nothing
(it'sValue
andGradient
that won't work). You can obtain global dissipation by differencing globally integrated TKE (and perhaps a pointwise dissipation by evaluating the TKE budget), but I agree that it's probably more difficult. @simone-silvestri may have some tips as he has been developing an implicit LES scheme for mesoscale turbulence.Does
AnisotropicMinimumDissipation
work? Or other closures?
Nice catch! It does work for AMD. I hadn't tried that before because I assumed it wasn't gonna work. Unfortunately I can't really use AMD because it produces a lot of noise in the stratified regions of my domain, but hopefully adapting Smag isn't too hard?
I think it'll be easy, yeah.
Adapting Smagorinsky seems an easy avenue.
In terms of Implicit LES, you could try using just WENO without any closure, but if you are in a true LES regime it would probably be too dissipative. A Smagorinsky viscosity combined with an energy-conserving advection scheme has been found to be less dissipative although noisier (here is an example applied to Burgers equation https://reader.elsevier.com/reader/sd/pii/S0377042717303035?token=83A413B5659B8B16B96E1D0CBDAD5865D8552AE5B2FF2FDFE78FFDEF064F2820B38D1BBFF646D3F7B75D58FE010DF7DB&originRegion=us-east-1&originCreation=20230125193548). You can always try with higher order (maybe 7th?) but the higher the order the lower the stability (i.e. at a certain order your implicit dissipation will be so low that you will start to generate grid-scale noise).
The KE dissipation is there with closure=nothing
and a WENO scheme, it is just not strictly physical: it's (roughly) akin to a 4th to a 6th-order hyperviscosity. You can compare this to using UpwindBiased(order = 5)
which would give you everywhere a dissipation that converges to a 6th-order hyperviscosity. In general:
$$\partial_x {uu}^{Upwind_N} \sim \partialx {uu}^{Centered{N+1}} + \partialx {K{numerical}} \partial_x^{N} u$$
where $K_{numerical} \sim \Delta x u$ and $N$ is the order
The nice thing about using WENO instead of a simple Upwind discretization is that the order of the hyperviscosity adapts to the smoothness of the field. Therefore, where the field is noisier (like in regions of higher gradients) the dissipation is more aggressive.
This procedure not only ensures a smooth field but can be thought of as mimicking the subgrid-scale dissipation (which also increases with the gradient of resolved scale variables. As such people have referred to using particularly diffusive advective schemes (such as WENO) as Implicit LES.
I like the idea of implicit LES because it allows you to "fill in" for the subgrid-scale dissipation without committing to any sophisticated formulation derived in a particularly idealized situation (take the example of Leith derived in homogeneous 2D turbulence and the geostrophic eddies in the ocean), and guarantees (or at least helps) stability. For this reason, it's pretty handy when you have different unresolved processes at different scales that may be characterized by different dissipation characteristics.
@simone-silvestri do you have any formula for computing the local energy dissipation rate due to numerical viscosity? For the global dissipation I think evaluating the time evolution of the total KE is a good route, but I'm less sure the best method for obtaining the local dissipation rate.
I guess a way to do it might be by evaluating the kinetic energy budget (using a conservative method for advective fluxes)
When running a relatively complex, but small simulation, I'm getting errors on the GPU that I can't make sense of. It's hard to come up with a minimum working example that's truly small because the error seems to pop up only when there's some level of complexity, but here's what I have so far:
This (and way more complex examples) runs fine on the CPU but when I run that on the GPU I get:
I'm not really sure what to make of this error. Because the simulation I need to run is necessarily complex, it's been hard to get rid of this error in my main simulation. Any ideas?
PS.: I ran this MWE in particular on an NVIDIA Quadro GP100 GPU, but I have gotten the same error (albeit with a longer MWE) on Tesla V100s.