Making all examples GPU-compatible

tomchor commented 3 years ago

I haven't had the chance to try all the examples in the docs on GPUs, but am I correct to assume that some of them won't compile on GPUs?

If that's the case, should make an effort to make all of those examples GPU-compatible?

glwagner commented 3 years ago

I'm not sure whether that is a correct assumption or not.

Some of them were designed to be used on the GPU:

https://github.com/CliMA/Oceananigans.jl/blob/6e39d3fcc098c69ac207cc21be759cf6bd3ec604/examples/ocean_wind_mixing_and_convection.jl#L157-L158

But I don't think anyone has ever tried to run the Kelvin-Helmholtz example on the GPU before (for example). Many of the others have been run on the GPU. But I think to really ensure this is the case in the long run we'll have to use CI.

We actually used to do something like this (including altering selected lines in the test scripts to make them more amenable to CI) so someone could dredge up that testing code to use with our GPU CI to solve this.

francispoulin commented 3 years ago

I think modifying the examples to run on GPUs easily is a good idea. After putting together the ShallowWaterModel example I tried it on a GPU and found out that it failed. I added a bunch of const and it then worked easily enough. This would make it easier for the user to play with the two architectures, which would be a big plus.

tomchor commented 3 years ago

One thing that I was also thinking about is that defining (global) problem parameters as constants might help out regardless. Every time I do that for my scripts it speeds up compilation times and often also significantly speeds up runtimes. Therefore there might be a benefit to using const throughout, even without GPUs.

The downside to that is that if a user tries to run several examples in the same REPL session they might be, without their knowing, trying to redefine some constants and get some errors and warnings related to that.

On Fri, Jul 16, 2021 at 8:05 AM Francis J. Poulin @.***> wrote:

I think modifying the examples to run on GPUs easily is a good idea. After putting together the ShallowWaterModel example I tried it on a GPU and found out that it failed. I added a bunch of const and it then worked easily enough. This would make it easier for the user to play with the two architectures, which would be a big plus.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CliMA/Oceananigans.jl/issues/1863#issuecomment-881516816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEX5KUFT35UOWJSRKU2M5TTYBDCBANCNFSM5AOQEC5A .

-- Tomás L. Chor Postdoctoral researcher Atmospheric and Oceanic Science department University of Maryland https://tomchor.github.io/

glwagner commented 3 years ago

In the examples I have developed I have tried to always declare variables that are referenced globally in forcing functions or boundary conditions as const. Definitely open PRs to fix this if you find places where that isn't the case. I haven't stayed on top of every example that's been added.

tomchor commented 3 years ago

I see some stuff like N2 in the plankton example, which is a fixed problem parameter used in the BC but isn't a const. I haven't tried to test if making this a constant speeds up things, but I guess I should, no? Should I make a PR to make those alterations?

P.S.: I'm not super clear on which cases defining things as a const helps or not. I just know that the general rule is use something as a const if it really isn't gonna change in the problem. That general rule comes from the julia docs.

francispoulin commented 3 years ago

I think if you want to run examples on GPUs you need to use const, so that's a good reason to use it. I have never done any comparisons about what impact it has on efficiency.

I would suggest making a PR as the more versitile the examples are the better they could be helpful to the users.

glwagner commented 3 years ago

const matters when a variable is referenced as a global variable in a function; something like

const a = 2
f(x) = a * x

a is not an argument to f(x); it's value is taken from the global scope. In that case it needs to be const, or f(x) cannot be compiled on the GPU.

When numbers are added to explicit data structures --- which is what happens when they are inserted into the parameters kwarg in the constructor for BoundaryCondition or Forcing --- then it's irrelevant whether a variable is declared const. This is because in that case the variable is explicitly an argument to the function (eg via the argument p in boundary_condition(x, y, t, p)). In fact, this is the purpose of the parameters kwarg --- to avoid having to use const (which is annoying or inconvenient, and has compilation / performance pitfalls). However the API has not developed enough to completely avoid const in all cases (eg for BackgroundFields, or for masking / target functions in things like Relaxation). So we still wrestle with it from time to time.

glwagner commented 3 years ago

We certainly should not use const when we don't need to. That could mislead users into thinking its important or necessary, when its not.

glwagner commented 3 years ago

Also to be clear, declaring something as const, and then inserting that variable's value into another data structure does not guarantee that the value in the second data structure is fixed. const attaches to a name and does not "propagate" into other data structures like ContinuousBoundaryFunction.parameters. So things like the following are valid:

julia> mutable struct Test{T}
       a :: T
       end

julia> const b = 2
2

julia> t = Test(b)
Test{Int64}(2)

julia> t.a = 3
3

julia> t
Test{Int64}(3)

glwagner commented 3 years ago

Another non-function example is this piece of code from the wind mixing case:

Qᵀ = Qʰ / (ρₒ * cᴾ) # K m s⁻¹, surface temperature flux

# Finally, we impose a temperature gradient `dTdz` both initially and at the
# bottom of the domain, culminating in the boundary conditions on temperature,

dTdz = 0.01 # K m⁻¹

T_bcs = FieldBoundaryConditions(top = FluxBoundaryCondition(Qᵀ),
                                bottom = GradientBoundaryCondition(dTdz))

This should work on the GPU. The reason is that Qᵀ and dTdz are not referenced in functions. Instead, they end up inside the data structures model.tracers.T.boundary_conditions.top.condition and model.tracers.T.boundary_conditions.bottom.condition. Likewise, this code is valid too:

@inline Qˢ(x, y, t, S, evaporation_rate) = - evaporation_rate * S # [salinity unit] m s⁻¹
nothing # hide

# where `S` is salinity. We use an evporation rate of 1 millimeter per hour,

evaporation_rate = 1e-3 / hour # m s⁻¹

# We build the `Flux` evaporation `BoundaryCondition` with the function `Qˢ`,
# indicating that `Qˢ` depends on salinity `S` and passing
# the parameter `evaporation_rate`,

evaporation_bc = FluxBoundaryCondition(Qˢ, field_dependencies=:S, parameters=evaporation_rate)

because evaporation_rate enters into Qˢ in its 5th argument. It does not need to be, and should not be, const.

glwagner commented 3 years ago

Here's the julia docs on constants for reference:

https://docs.julialang.org/en/v1/manual/variables-and-scoping/#Constants

This statement in the julia docs could maybe be clarified:

It is difficult for the compiler to optimize code involving global variables, since their values (or even their types) might change at almost any time.

because its ambiguous what "code involving global variables" is. For Oceananigans, this almost always means "functions that capture global variables in their scope". So f(x) = a * x is going to be slow unless a is const; otherwise f(x) cannot be inlined properly because the type of a is uncertain.

francispoulin commented 3 years ago

I tried running the ShallowWaterModel example on a GPU and it failed because of how we compute the norm, see the error message below.

@glwagner , I remember we talked about this but, sadly, I don't know if we had a solution. What would you recommend?

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
  [3] getindex(::CUDA.CuArray{Float64, 3}, ::Int64, ::Int64, ::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
  [4] getindex
    @ ./subarray.jl:276 [inlined]
  [5] _getindex
    @ ./abstractarray.jl:1214 [inlined]
  [6] getindex
    @ ./abstractarray.jl:1170 [inlined]
  [7] iterate
    @ ./abstractarray.jl:1096 [inlined]
  [8] iterate
    @ ./abstractarray.jl:1094 [inlined]
  [9] generic_normInf(x::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:465
 [10] normInf
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:556 [inlined]
 [11] generic_norm2(x::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:497
 [12] norm2
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:558 [inlined]
 [13] norm(itr::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, p::Int64)
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:627
 [14] norm(itr::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:625
 [15] perturbation_norm(model::ShallowWaterModel{RegularRectilinearGrid{Float64, Periodic, Bounded, Flat, OffsetArrays.OffsetVector{Fl

glwagner commented 3 years ago

It looks like perturbation_norm needs to be defined in a GPU friendly way. The error message is cutoff so I can't see where that function is defined (the clue is at the bottom of what's posted):

 [15] perturbation_norm(model::ShallowWaterModel{RegularRectilinearGrid{Float64, Periodic, Bounded, Flat, OffsetArrays.OffsetVector{Fl

glwagner commented 3 years ago

Happy to help if you point me to where perturbation_norm is defined.

francispoulin commented 3 years ago

The linke is here but the function is copied below

function perturbation_norm(model)
    compute!(v)
    return norm(interior(v))
end

glwagner commented 3 years ago

I think you just need to use norm(v) rather than norm(interior(v)).

We define norm here:

https://github.com/CliMA/Oceananigans.jl/blob/798b1efb1686f1781b7d19919b191a0f92efd519/src/Fields/abstract_field.jl#L281-L282

Though it's untested (but probably works, at least right now...)

francispoulin commented 3 years ago

I am happy to try that. I have been using LinearAlgebra.norm, which is maybe the wrong norm to use? Instead of saying using LinearAlgebra: norm, maybe we should be using this function?

francispoulin commented 3 years ago

It looks like we need LinearAlgebra and I will make the change right now. That means we have one more example that if we change CPU to GPU it will work.

glwagner commented 3 years ago

Statistics.norm and LinearAlgebra.norm are the same:

julia> import Statistics

julia> import LinearAlgebra

julia> Statistics.norm === LinearAlgebra.norm
true

francispoulin commented 3 years ago

Ah, good to know.

Thanks! PR has been created.

glwagner commented 1 year ago

I'm closing this issue because I'm judging that it's not of current, timely relevance to Oceananigans development. If you would like to make it a higher priority or if you think the issue was closed in error please feel free to re-open.

sangeethasankar01 commented 7 months ago

@glwagner I am trying to run the Kelvin-Helmholtz instability example on a GPU; however, the model has failed, throwing some errors. Can someone help me to sort this error. Please find the attached error below:

using Random, Statistics

mean_perturbation_kinetic_energy = Field(Average(1/2 * (u^2 + w^2))) noise(x, z) = randn() set!(model, u=noise, w=noise, b=noise) rescale!(simulation.model, mean_perturbation_kinetic_energy, target_kinetic_energy=1e-6) growth_rates, power_method_data = estimate_growth_rate(simulation, mean_perturbation_kinetic_energy, perturbation_vorticity, b)

@info "Power iterations converged! Estimated growth rate: $(growth_rates[end])"

Error: Scalar indexing is disallowed. Invocation of getindex resulted in scalar indexing of a GPU array. This is typically caused by calling an iterating implementation of a method. Such implementations do not execute on the GPU, but very slowly on the CPU, and therefore should be avoided.

If you want to allow scalar iteration, use allowscalar or @allowscalar to enable scalar iteration globally or for the operations in question.

CliMA / Oceananigans.jl

Making all examples GPU-compatible #1863