Open charleskawczynski opened 4 years ago
One issue with this approach is that there will exist potentially many source!
functions, and perhaps dispatch will be slow.
Dynamic dispatch won't be that slow, given that each source term is "costly". Running this on the GPU is trickier since we need to precompute the dispatch there.
@vchuravy what makes precomputing dispatch here tricky?
Can you tuple the source terms and then unroll on the GPU?
I think it's doable, yes.
Can you tuple the source terms and then unroll on the GPU?
This is what I did in #407: I've tried using ntuple
, we can see how that pans out.
@simonbyrne @charleskawczynski awesome! I'm curious how it turns out. If unrolling via ntuple
works inside GPU kernels that will probably be quite useful for Oceananigans
as well.
@vchuravy what makes precomputing dispatch here tricky?
Think about these two cases here:
First Case:
sources = Source[HeldsSuarezSrc1(), DycomsSrc1()]
for src in sources
f(src)
end
Second Case:
sources = (HeldsSuarezSrc1(), DycomsSrc1())
for src in sources
f(src)
end
In the first case you have an array of type Vector{Source}
e.g. the super-type of both source terms. The call to f
therefore will always be a dynamic (virtual in C++) function call. In the second case we use a Tuple instead of an Array so now we have Tuple{HeldsSuarezSrc1, DycomsSrc1}
so we know the specific types, but in the first iteration src
will be of type HeldsSuarezSrc1
and in the second DyComsSrc1
so in a fixed-point approximation src has the type Union{HeldsSuarezSrc1, DycomsSrc1}
(Union is a either). For small unions Julia will do union-splitting and turn your code into:
for src in sources
if src isa HeldsSuarezSrc1
f(src::HeldsSuarezSrc1)
elseif src isa DycomsSrc1
f(src::DycomsSrc1)
else
error()
end
end
Which de-virtualizes each call to f
with the precise type turning dynamic dispatch into static. Still this relies on only using small unions (up to 4 or 5 I think).
Lastly you may use ntuple
:
ntuple(length(sources)) do i
Base.@_inline_meta
f(sources[i])
nothing
end
Which turns into:
f(sources[1])
f(sources[2])
and since you no longer have a for-loop with a variable each invocation of f
is now type-stable since each type of each tuple element is know.
I hope this helps :)
I think this issue, or at least the design of delivering this requirement, is basically closed by #1634 (and subsequent PRs) for the atmosphere. Should we leave it open for other model components?
We need a solution to adding different combinations of source terms. How about we use something like this:
This way, we can specify custom and standard lists of source terms. Plus, we can reuse
Dycoms()
for boundary conditions, or any other necessary parameters.