Closed jagoosw closed 1 year ago
While the kernel for SimpleMultiG sediments
where is this kernel? Link to code?
This is as it currently is in main: https://github.com/OceanBioME/OceanBioME.jl/blob/ac8419a1ce5a06a82be31588583636b212c09598/src/Boundaries/Sediments/simple_multi_G.jl#L159-L234 Which has another bug in it which is fixed in #138
Interesting. I'm surprised this won't compile. Likely you can simplify the arguments to the kernel to get it to compile. Parameter space issues don't have to do with kernel complexity directly, but rather than complexity of the input arguments (ie using Field
rather than OffsetArray
). Reducing the input argument complexity is why, for example, we can't support Field
directly in GPU kernels and are forced to work with OffsetArray
s instead.
One issue might be that the time-stepper is input directly. You only need the named tuple Gⁿ
--- it could be better to use that as an input argument instead.
sediment
may also have more info than you need on the GPU, which you can simplify with an appropriate adapt_structure
method.
In general, its good practice to define adapt_structure
to return the minimal info expected to be useful within a GPU kernel so as much is compilable as possible.
This is the fixed version that does some of those things:
https://github.com/OceanBioME/OceanBioME.jl/blob/8f133dfe5301e9241eddb3a5e5a74b0c0559142e/src/Boundaries/Sediments/simple_multi_G.jl#L175-L246
I think the adapt structure route may be the solution, e.g. at the moment we're unnecessarily passing the G$^{-1}$ tendency fields for the sediment fields, but otherwise this kernel does need the majority of the info in sediment
.
Also, this line does not belong inside the kernel:
if isnan(pₐₙₒₓ)
error("Sediment anoxia has caused model failure")
end
If you want to check for NaNs, you should do that outside the kernel. But note that the NaNChecker
can also be configured to check for NaNs in any field. You may want to use that instead.
Oh yeah that's a good idea. Perhaps that is the better solution to https://github.com/OceanBioME/OceanBioME.jl/issues/144 as well
This is the fixed version that does some of those things:
I think the adapt structure route may be the solution, e.g. at the moment we're unnecessarily passing the G$^{-1}$ tendency fields for the sediment fields, but otherwise this kernel does need the majority of the info in
sediment
.
This worked!
Hmm actually this reduced the parameter size enough on for the test with a rectilinear grid for non hydrostatic and hydrostatic, but still fails on long/lat grid
We're also passing a load of repeated information about the advection schemes, maybe that could be reduced
You want to reduce parameter size to the minimum needed. If you are near the limit, your code could fail when CUDA updates for example. It will also be difficult to develop the code. You should inspect every input argument and ensure that only the minimum is loaded onto the GPU.
We do need all of the entries I think but I think the question is if we need to pass all of model.advection
because it is very large. I will try to work out a way to just take the ones we need
You may need all the entries, but you may be missing adapt_structure methods for them.
You probably want to pass just the tracer advection right? For hydrostatic models, the tracer and momentum schemes are stored separately anyways.
For the advection schemes, I think we can reduce it by just passing the advection schemes for the tracers which sink.
I'm not sure where else to reduce the size though. Currently, we have:
sediment, underlying_biogeochemistry, grid, advection_schemes, tracers, timestepper.Gⁿ, tendencies.Gⁿ
I've written adapt structures for sediment so it just has the parameters and fields, underlying_biogeochemistry which always just has parameters and the sinking velocities, advection schemes which is either a scheme or NamedTuple, tracers and both tendency sets which are NamedTuples. All of these things are required because they're used or modified, or used to work out the flux of the sinking tracers.
Which properties have fields embedded, and can you link to the code for those adapt_structure
so we can see them here?
The sediment models have fields
(and tendencies
) which are excluded in the adapt_structure
and
and the models have sinking_velocities
which are named tuples of named tuples of fields:
and
We also have this:
which I think might be redundant
I guess some of the complexity for SimpleMultiG
is also that the _params
properties are tuples. Do they need to be adapted?
I guess some of the complexity for
SimpleMultiG
is also that the_params
properties are tuples. Do they need to be adapted?
perhaps! It's conservative to propagate the adapt
to every property.
Note that you also want to write your code to be robust to change in the future. When you assume that you don't need to adapt some property, you implicitly prevent someone from improving / extending your model to properties that do need adaptation. By conservatively adapting everything, you grease the wheels for scientific advancement in the future.
If you call adapt
on a tuple, it will call adapt on every one of it's properties:
https://github.com/JuliaGPU/Adapt.jl/blob/df06bcb6936baa7352b8cc7bf5f08f98f2653f25/src/base.jl#L3
The basic structure of any adapt
for a custom struct should follow the same logic. The extra thing that custom structs can do it to completely throw away unneeded properties (ie set them to nothing
). Or other bespoke actions.
Okay this is a good point, I'll update all the adapts to make sure everything gets adapted.
Looking at the error message here:
a lot of the information in lines 2 to 4 is about those named tuples so I'll have a go running it with them changed to tuples
That took 8 bytes off the parameter size so wasn't very successful
Vectors worked a bit better taking 56 bytes off. For some reason forcing it to only pass one of the advection schemes (rather than an NamedTuple of them) doesn't save any
Our problems are solved: https://github.com/JuliaGPU/CUDA.jl/issues/2080 !!
Wow, that's huge. Hopefully there isn't a catastrophic loss of performance...
Are you sure the function that failed is the one we are concerned about? It's not clear from the error.
and the models have sinking_velocities which are named tuples of named tuples of fields:
This nested structure is often the cause for issues. You can try to flatten it, perhaps.
Yeah hopefully, I can't find docs about the ISA change that allows this, but presumably someone down the line has tested the performance change.
And yeah its this https://github.com/OceanBioME/OceanBioME.jl/blob/5abb4217ab984f2cd32f2bee8742201c68f7e59b/src/Boundaries/Sediments/simple_multi_G.jl#L175
method (there was more detail somewhere else in the full error that confirmed it to me).
This nested structure is often the cause for issues. You can try to flatten it, perhaps.
Thinking about it we never have u or v components of slip velocity so we can reduce the complexity to just named tuple. Will test.
So near, with all of the optimisation above and removing the u and v components from sinking velocities its still 8bytes too large.
Out of curiosity (wondering if something I said was wrong) --- does it matter what the body of the kernel is? For example you could comment everything out except perhaps something trivial.
As a fallback solution, you might try separating these calculations into multiple kernels.
It might be a good idea anyways to reorganize the code so that it's easily to toggle back and forth. For example, right now the tendency calculation for the different species are intertwined.
Out of curiosity (wondering if something I said was wrong) --- does it matter what the body of the kernel is? For example you could comment everything out except perhaps something
It does still fail
As a fallback solution, you might try separating these calculations into multiple kernels.
It might be a good idea anyways to reorganize the code so that it's easily to toggle back and forth. For example, right now the tendency calculation for the different species are intertwined.
I could see this being a better idea.
It is now working (by only giving it the tracers and tendencies it needs) but I assume the parameter size is close to the limits,is there a way for me to check?
I think I would rather make restructuring how the sediment tendencies are calculated a different PR since this is getting quite long now?
Of course, no need to solve the world in one PR.
Initial issue closed by #138 and remainder superceeded by #147
While the kernel for
SimpleMultiG
sediments will compile on GPU they will not run due to their complexity leading to a PTX error about parameter size being thrown. This may be fixed in the future if LLVM updates its CUDA version support (https://github.com/JuliaGPU/CUDA.jl/issues/2080). We may also be able to rework the model to use less parameter space so please do get in contact if you would like to use the model on GPU.An error similar to the following will be raised:
Also, note that
InstantRemineralisation
sediment does work on GPU.Discussed in https://github.com/OceanBioME/OceanBioME.jl/pull/138.