Closed tomchor closed 1 year ago
I'm not sure whether that is a correct assumption or not.
Some of them were designed to be used on the GPU:
But I don't think anyone has ever tried to run the Kelvin-Helmholtz example on the GPU before (for example). Many of the others have been run on the GPU. But I think to really ensure this is the case in the long run we'll have to use CI.
We actually used to do something like this (including altering selected lines in the test scripts to make them more amenable to CI) so someone could dredge up that testing code to use with our GPU CI to solve this.
I think modifying the examples to run on GPUs easily is a good idea. After putting together the ShallowWaterModel
example I tried it on a GPU and found out that it failed. I added a bunch of const
and it then worked easily enough. This would make it easier for the user to play with the two architectures, which would be a big plus.
One thing that I was also thinking about is that defining (global) problem
parameters as constants might help out regardless. Every time I do that for
my scripts it speeds up compilation times and often also significantly
speeds up runtimes. Therefore there might be a benefit to using const
throughout, even without GPUs.
The downside to that is that if a user tries to run several examples in the same REPL session they might be, without their knowing, trying to redefine some constants and get some errors and warnings related to that.
On Fri, Jul 16, 2021 at 8:05 AM Francis J. Poulin @.***> wrote:
I think modifying the examples to run on GPUs easily is a good idea. After putting together the ShallowWaterModel example I tried it on a GPU and found out that it failed. I added a bunch of const and it then worked easily enough. This would make it easier for the user to play with the two architectures, which would be a big plus.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CliMA/Oceananigans.jl/issues/1863#issuecomment-881516816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEX5KUFT35UOWJSRKU2M5TTYBDCBANCNFSM5AOQEC5A .
-- Tomás L. Chor Postdoctoral researcher Atmospheric and Oceanic Science department University of Maryland https://tomchor.github.io/
In the examples I have developed I have tried to always declare variables that are referenced globally in forcing functions or boundary conditions as const
. Definitely open PRs to fix this if you find places where that isn't the case. I haven't stayed on top of every example that's been added.
I see some stuff like N2
in the plankton example, which is a fixed problem parameter used in the BC but isn't a const
. I haven't tried to test if making this a constant speeds up things, but I guess I should, no? Should I make a PR to make those alterations?
P.S.: I'm not super clear on which cases defining things as a const helps or not. I just know that the general rule is use something as a const
if it really isn't gonna change in the problem. That general rule comes from the julia docs.
I think if you want to run examples on GPUs you need to use const
, so that's a good reason to use it. I have never done any comparisons about what impact it has on efficiency.
I would suggest making a PR as the more versitile the examples are the better they could be helpful to the users.
const
matters when a variable is referenced as a global variable in a function; something like
const a = 2
f(x) = a * x
a
is not an argument to f(x)
; it's value is taken from the global scope. In that case it needs to be const
, or f(x)
cannot be compiled on the GPU.
When numbers are added to explicit data structures --- which is what happens when they are inserted into the parameters
kwarg in the constructor for BoundaryCondition
or Forcing
--- then it's irrelevant whether a variable is declared const
. This is because in that case the variable is explicitly an argument to the function (eg via the argument p
in boundary_condition(x, y, t, p)
). In fact, this is the purpose of the parameters
kwarg --- to avoid having to use const
(which is annoying or inconvenient, and has compilation / performance pitfalls). However the API has not developed enough to completely avoid const
in all cases (eg for BackgroundFields
, or for masking / target functions in things like Relaxation
). So we still wrestle with it from time to time.
We certainly should not use const
when we don't need to. That could mislead users into thinking its important or necessary, when its not.
Also to be clear, declaring something as const
, and then inserting that variable's value into another data structure does not guarantee that the value in the second data structure is fixed. const
attaches to a name and does not "propagate" into other data structures like ContinuousBoundaryFunction.parameters
. So things like the following are valid:
julia> mutable struct Test{T}
a :: T
end
julia> const b = 2
2
julia> t = Test(b)
Test{Int64}(2)
julia> t.a = 3
3
julia> t
Test{Int64}(3)
Another non-function example is this piece of code from the wind mixing case:
Qᵀ = Qʰ / (ρₒ * cᴾ) # K m s⁻¹, surface temperature flux
# Finally, we impose a temperature gradient `dTdz` both initially and at the
# bottom of the domain, culminating in the boundary conditions on temperature,
dTdz = 0.01 # K m⁻¹
T_bcs = FieldBoundaryConditions(top = FluxBoundaryCondition(Qᵀ),
bottom = GradientBoundaryCondition(dTdz))
This should work on the GPU. The reason is that Qᵀ
and dTdz
are not referenced in functions. Instead, they end up inside the data structures model.tracers.T.boundary_conditions.top.condition
and model.tracers.T.boundary_conditions.bottom.condition
. Likewise, this code is valid too:
@inline Qˢ(x, y, t, S, evaporation_rate) = - evaporation_rate * S # [salinity unit] m s⁻¹
nothing # hide
# where `S` is salinity. We use an evporation rate of 1 millimeter per hour,
evaporation_rate = 1e-3 / hour # m s⁻¹
# We build the `Flux` evaporation `BoundaryCondition` with the function `Qˢ`,
# indicating that `Qˢ` depends on salinity `S` and passing
# the parameter `evaporation_rate`,
evaporation_bc = FluxBoundaryCondition(Qˢ, field_dependencies=:S, parameters=evaporation_rate)
because evaporation_rate
enters into Qˢ
in its 5th argument. It does not need to be, and should not be, const
.
Here's the julia docs on constants for reference:
https://docs.julialang.org/en/v1/manual/variables-and-scoping/#Constants
This statement in the julia docs could maybe be clarified:
It is difficult for the compiler to optimize code involving global variables, since their values (or even their types) might change at almost any time.
because its ambiguous what "code involving global variables" is. For Oceananigans, this almost always means "functions that capture global variables in their scope". So f(x) = a * x
is going to be slow unless a
is const
; otherwise f(x)
cannot be inlined properly because the type of a
is uncertain.
I tried running the ShallowWaterModel
example on a GPU
and it failed because of how we compute the norm, see the error message below.
@glwagner , I remember we talked about this but, sadly, I don't know if we had a solution. What would you recommend?
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] assertscalar(op::String)
@ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
[3] getindex(::CUDA.CuArray{Float64, 3}, ::Int64, ::Int64, ::Int64)
@ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
[4] getindex
@ ./subarray.jl:276 [inlined]
[5] _getindex
@ ./abstractarray.jl:1214 [inlined]
[6] getindex
@ ./abstractarray.jl:1170 [inlined]
[7] iterate
@ ./abstractarray.jl:1096 [inlined]
[8] iterate
@ ./abstractarray.jl:1094 [inlined]
[9] generic_normInf(x::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
@ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:465
[10] normInf
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:556 [inlined]
[11] generic_norm2(x::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
@ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:497
[12] norm2
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:558 [inlined]
[13] norm(itr::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, p::Int64)
@ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:627
[14] norm(itr::SubArray{Float64, 3, CUDA.CuArray{Float64, 3}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false})
@ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:625
[15] perturbation_norm(model::ShallowWaterModel{RegularRectilinearGrid{Float64, Periodic, Bounded, Flat, OffsetArrays.OffsetVector{Fl
It looks like perturbation_norm
needs to be defined in a GPU friendly way. The error message is cutoff so I can't see where that function is defined (the clue is at the bottom of what's posted):
[15] perturbation_norm(model::ShallowWaterModel{RegularRectilinearGrid{Float64, Periodic, Bounded, Flat, OffsetArrays.OffsetVector{Fl
Happy to help if you point me to where perturbation_norm
is defined.
The linke is here but the function is copied below
function perturbation_norm(model)
compute!(v)
return norm(interior(v))
end
I think you just need to use norm(v)
rather than norm(interior(v))
.
We define norm
here:
Though it's untested (but probably works, at least right now...)
I am happy to try that. I have been using LinearAlgebra.norm
, which is maybe the wrong norm to use? Instead of saying using LinearAlgebra: norm
, maybe we should be using
this function?
It looks like we need LinearAlgebra
and I will make the change right now. That means we have one more example that if we change CPU
to GPU
it will work.
Statistics.norm
and LinearAlgebra.norm
are the same:
julia> import Statistics
julia> import LinearAlgebra
julia> Statistics.norm === LinearAlgebra.norm
true
Ah, good to know.
Thanks! PR has been created.
I'm closing this issue because I'm judging that it's not of current, timely relevance to Oceananigans development. If you would like to make it a higher priority or if you think the issue was closed in error please feel free to re-open.
@glwagner I am trying to run the Kelvin-Helmholtz instability example on a GPU; however, the model has failed, throwing some errors. Can someone help me to sort this error. Please find the attached error below:
using Random, Statistics
mean_perturbation_kinetic_energy = Field(Average(1/2 * (u^2 + w^2))) noise(x, z) = randn() set!(model, u=noise, w=noise, b=noise) rescale!(simulation.model, mean_perturbation_kinetic_energy, target_kinetic_energy=1e-6) growth_rates, power_method_data = estimate_growth_rate(simulation, mean_perturbation_kinetic_energy, perturbation_vorticity, b)
@info "Power iterations converged! Estimated growth rate: $(growth_rates[end])"
Error: Scalar indexing is disallowed. Invocation of getindex resulted in scalar indexing of a GPU array. This is typically caused by calling an iterating implementation of a method. Such implementations do not execute on the GPU, but very slowly on the CPU, and therefore should be avoided.
If you want to allow scalar iteration, use allowscalar
or @allowscalar
to enable scalar iteration globally or for the operations in question.
I haven't had the chance to try all the examples in the docs on GPUs, but am I correct to assume that some of them won't compile on GPUs?
If that's the case, should make an effort to make all of those examples GPU-compatible?