Closed bloops closed 1 year ago
I can replicate it with
using ClimaShallowWater, ClimaComms
context = ClimaComms.SingletonCommsContext(ClimaComms.CUDA())
ClimaComms.init(context)
testcase = ClimaShallowWater.SteadyStateTest()
space = ClimaShallowWater.create_space(
context,
testcase;
float_type=Float32,
panel_size=128,
poly_nodes=4,
)
Y = ClimaShallowWater.initial_condition(space, testcase)
p = ClimaShallowWater.auxiliary_state(Y, testcase)
ClimaShallowWater.dss!(Y, p)
dY = ClimaShallowWater.similar(Y)
ClimaShallowWater.tendency!(dY,Y,p,zero(FT))
Ah, I think I figured it out. We launch the spectral element operators with the elements in the Y block: https://github.com/CliMA/ClimaCore.jl/blob/e7c7e9c245ab55aaf5dd2d2d121eacc5c5de7607/src/Operators/spectralelement.jl#L276 However this has a limit of 65535: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications which matches up with what @bloops saw:
julia> 128*128*6
98304
julia> 96*96*6
55296
We can fix this in ClimaCore by changing the order of the blocks.
I'm trying to do a scaling analysis on a single GPU, but using
--panel-size=128
leads to a CUDA error. I presume this might be an OOM since the code works for panel size up to 96.However, I think the spectral element code would have arrays of size O(EN²) (or possibly O(EN⁴) intermediate arrays) where E = 6(panel size)² and N=4. This still amounts to <100MiB so is the OOM expected?
Here's the stacktrace. shallowwater_float32_panelsize128_stacktrace.txt