Pressure has extremely high gradients in random chunks of simulation using `NonhydrostaticModel` with `ImmersedBoundaryGrid` and `BuoyancyTracer`

CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs

https://clima.github.io/OceananigansDocumentation/stable

MIT License

990 stars 194 forks source link

Pressure has extremely high gradients in random chunks of simulation using `NonhydrostaticModel` with `ImmersedBoundaryGrid` and `BuoyancyTracer` #3593

Open tomchor opened 5 months ago

tomchor commented 5 months ago

In some simulations the pressure values (and more importantly the values of the pressure gradient) are larger than expected by several orders of magnitude. This happens in seemingly random (time) chunks of the simulations, and then everything comes back to normal before happening again in other chunks.

Here's a MWE where I create a grid with channel-like topology which has an immersed boundary at x=400meters that acts as an East wall. I initialize it with a uniform stratification of 6e-6/second and zero velocities everywhere.

using Oceananigans

grid_base = RectilinearGrid(topology = (Bounded, Periodic, Bounded),
                            size = (16, 20, 4), extent = (800, 1000, 100),)

@inline east_wall(x, y, z) = x > 400
grid = ImmersedBoundaryGrid(grid_base, GridFittedBoundary(east_wall))

model = NonhydrostaticModel(grid = grid, timestepper = :RungeKutta3,
                            buoyancy = BuoyancyTracer(), tracers = :b,
                            )

N² = 6e-6
b∞(x, y, z) = N² * z
set!(model, b=b∞)

simulation = Simulation(model, Δt=25, stop_time=1e4,)
simulation.output_writers[:snaps] = NetCDFOutputWriter(model, (; model.pressures.pNHS,),
                                                       filename = "test_pressure.nc",
                                                       schedule = TimeInterval(100),
                                                       overwrite_existing = true,)
run!(simulation)

Here are some snapshots of the pressure (the blue line is where the immersed boundary is):

And here's a plot of the standard deviation of pressure over the whole domain as a function of time where we can visualize when this issue happens:

Here is the code I used to generate these figures in case anyone's interested in playing around with it.

A bit of context: I noticed this issue a long time ago, and @whitleyv before me (I believe @amrapallig also mentioned coming across this issue at some point). It hasn't been an issue in the past, but now that I need to close a KE balance in a domain where the pressure contributions don't vanish I find that this issue prevents me from obtaining any sort of reasonable estimate for the pressure contribution. (Which is odd, since it implies that these "jumps" have no impact on dynamics...)

A few notes:

This only seems to happen when there's a vertical immersed boundary interface and buoyancy isn't nothing (or zero). To clarify, this issue does happen in sloping topographies, but it goes away if instead I make the immersed boundary a "faux" flat bottom (e.g. if east_wall(x, y, z) = z < -50).
I haven't tried branch https://github.com/CliMA/Oceananigans.jl/pull/3188 yet, but it's on my to-do list. I figured even if that branch fixes it, it was worth opening this issue.
In this particular MWE if I switch the time-stepper to Adams-Bashforth the issue seemingly goes away (even though there's an initial jump in the pressure that I wouldn't expect), but it does persist (albeit more mildly) in more complex simulations.
This example was run with the latest version of Oceananigans v0.91.0 which doesn't separate the pressure anymore, but it also to happened in older versions with the pressure separation.

CC @wenegrat

francispoulin commented 5 months ago

Interesting. You are plotting the pressure above. Is it possible to plot the components of the gradient of pressure?

The perssure is determined to within a constant and I am very curious to see of the pressure gradient also changes a lot.

tomchor commented 5 months ago

Interesting. You are plotting the pressure above. Is it possible to plot the components of the gradient of pressure?

The perssure is determined to within a constant and I am very curious to see of the pressure gradient also changes a lot.

Yup, my first thought was that of the constant also. But as you can see in the snapshots, the gradient is also changing by a lot. Plus if it was only the constant that were changing, the std would remain the same, but it's clearly varying over many order of magnitude, which is why I didn't bother plotting the gradient in order to simplify diagnostics. In any case, here is dp/dx for the same simulation for future reference:

glwagner commented 5 months ago

Does it have to do with output? Do you know what the time-step is when this happens? I'm wondering if it has to do with using a very small time-step, leading to a round-off error issue. That would also explain why it doesn't affect dynamics, because huge pressure gradient integrated over a machine epsilon duration may not have an impact.

glwagner commented 5 months ago

The perssure is determined to within a constant and I am very curious to see of the pressure gradient also changes a lot.

Yup, my first thought was that of the constant also.

This is correct, but note that the constant is determined by the solver by setting the mean to 0:

https://github.com/CliMA/Oceananigans.jl/blob/fb2c670626e38e3bffe298c485f95625fb1d83be/src/Solvers/fft_based_poisson_solver.jl#L111

https://github.com/CliMA/Oceananigans.jl/blob/fb2c670626e38e3bffe298c485f95625fb1d83be/src/Solvers/fourier_tridiagonal_poisson_solver.jl#L131-L135

so it isn't arbitrary (it's just that in the physical problem, there is a domain mean and we don't know what that is).

tomchor commented 5 months ago

Does it have to do with output? Do you know what the time-step is when this happens?I'm wondering if it has to do with using a very small time-step, leading to a round-off error issue. That would also explain why it doesn't affect dynamics, because huge pressure gradient integrated over a machine epsilon duration may not have an impact.

Are you referring to the fact that sometimes model has to use a very tiny time-step to bridge the gap between the current time and the output time? If so, that's an interesting possibility that I hadn't considered. Although in the example above I'm fixing the time-step at 25, while the output time interval is 200, so I wouldn't expect any issues there. Also wouldn't that also affect simulations with buoyancy=nothing if the pre-output time-step were to blame?

glwagner commented 5 months ago

Does it have to do with output? Do you know what the time-step is when this happens?I'm wondering if it has to do with using a very small time-step, leading to a round-off error issue. That would also explain why it doesn't affect dynamics, because huge pressure gradient integrated over a machine epsilon duration may not have an impact.

Are you referring to the fact that sometimes model has to use a very tiny time-step to bridge the gap between the current time and the output time? If so, that's an interesting possibility that I hadn't considered. Although in the example above I'm fixing the time-step at 25, while the output time interval is 200, so I wouldn't expect any issues there.

Can you check just to be sure? Because having an output interval that's some multiple of the time step is exactly when we expect to see miniscule time-steps due to round off error.

The pressure source term is the divergence of the predictor velocity divided by time-step. As the time-step vanishes, the divergence of the predictor velocity also vanishes (because the flow has not evolved from its previous, non-divergent solution). We get a situation tending to 0/0.

I think there's a few things we could do to solve this. First of all if we take a very small time-step, I think we can actually just re-set the model time rather than taking a time-step.

Second I am wondering if we want to implement a time type that has finite resolution (ie there is a smallest time increment one can take). For example, datetimes have a smallest unit (micro or nanoseconds). A non-dimensional or dimensional-agnostic time type could also be designed analogously (eg every time is the multiple of an integer by the fundamental unit). This would eliminate round off error but it's a bit of work and also we have to put some thought into how best to accomplish it.

There might also be a simpler solution by adjusting how we increment time. I'm not sure.

Also wouldn't that also affect simulations with buoyancy=nothing if the pre-output time-step were to blame?

I think so, but I also don't think that you can guarantee this problem won't ever occur. The presence of the buoyancy does somehow impact the divergence that accumulates during a time-step / the pressure correction that has to be applied. So it's possible that the buoyancy configuration affects these results. Not sure.

tomchor commented 5 months ago

Does it have to do with output? Do you know what the time-step is when this happens?I'm wondering if it has to do with using a very small time-step, leading to a round-off error issue. That would also explain why it doesn't affect dynamics, because huge pressure gradient integrated over a machine epsilon duration may not have an impact.

Are you referring to the fact that sometimes model has to use a very tiny time-step to bridge the gap between the current time and the output time? If so, that's an interesting possibility that I hadn't considered. Although in the example above I'm fixing the time-step at 25, while the output time interval is 200, so I wouldn't expect any issues there.

Can you check just to be sure? Because having an output interval that's some multiple of the time step is exactly when we expect to see miniscule time-steps due to round off error.

Good point.

Also wouldn't that also affect simulations with buoyancy=nothing if the pre-output time-step were to blame?

I think so, but I also don't think that you can guarantee this problem won't ever occur. The presence of the buoyancy does somehow impact the divergence that accumulates during a time-step / the pressure correction that has to be applied. So it's possible that the buoyancy configuration affects these results. Not sure.

Yeah I see your point. I'll check model.last_Δt and report soon. I think it'll be good news if this is the culprit because it seems like we'd have fairly simple options to fix it.

tomchor commented 5 months ago

Ha! It seems @glwagner has cracked it! Top figure is the pressure std and bottom figure is model.clock.last_Δt:

I still don't fully understand why this seems to affect the simulations that I pointed out but not others, but it's hard to argue against that correlation.

@glwagner to prevent these round-off errors all we need to to is to ensure that first_stage_Δt, second_stage_Δt and third_stage_Δt add up to Δt in https://github.com/CliMA/Oceananigans.jl/blob/fb2c670626e38e3bffe298c485f95625fb1d83be/src/TimeSteppers/runge_kutta_3.jl#L94-L96

correct?

If so, would it be a good idea to work backwards in the last statement and set third_stage_Δt = Δt - first_stage_Δt - second_stage_Δt and use locally defined γ³ (or ζ³) from it? The difference would be either zero or very small by construction.

On a side note, while plotting this I realized that I was confused about model.clock.last_Δt, since I assumed it was the last Δt of the simulation (so in my MWE it would be 25 at all time steps except for maybe eventual round-off errors), but it's the Δt of the time-stepper substeps, which makes more sense but now I'm thinking maybe the name is a bit misleading and should be changed to last_substep_Δt or something. But that's an issue for another issue :)

glwagner commented 5 months ago

which makes more sense but now I'm thinking maybe the name is a bit misleading and should be changed to last_substep_Δt or something. But that's an issue for another issue :)

I think a better fix would be to make it correct... (I think we want the time step, not the substep).

glwagner commented 5 months ago

The difference would be either zero or very small by construction.

The difference is already very small. It's machine epsilon.

We need the difference to be exactly zero.

glwagner commented 5 months ago

@tomchor I think your solution might help RK3, but as you noted there is also a problem with AB2:

In this particular MWE if I switch the time-stepper to Adams-Bashforth the issue seemingly goes away (even though there's an initial jump in the pressure that I wouldn't expect), but it does persist (albeit more mildly) in more complex simulations.

There's an issue somewhere in

https://github.com/CliMA/Oceananigans.jl/blob/fb2c670626e38e3bffe298c485f95625fb1d83be/src/Simulations/run.jl#L41-L56

I'm also wondering if one issue is that we need to change the line

aligned_Δt = schedule_aligned_Δt(sim, aligned_Δt)

and maybe instead have an interface where callbacks return the next time of actuation. TimeInterval may also need to be redesigned... the fact that we compute the next actuation time b

https://github.com/CliMA/Oceananigans.jl/blob/fb2c670626e38e3bffe298c485f95625fb1d83be/src/Utils/schedules.jl#L65

maybe invites round-off error.

To start working on this I think we need an MWE. Mabye that's easy, just a simulation with constant time-step and output on TimeInterval which should, in theory, work perfectly.

glwagner commented 5 months ago

Hmm and there is one more point. Round-off error is the reason we get tiny time-steps, and we should fix that. However, that would still leave open the underlying problem, which is that the pressure correction fails for machine epsilon time-steps. So I'm wondering if in fact we should fix both issues.

glwagner commented 5 months ago

Ok, here's a multi-pronged strategy to address this:

Rather than computing the next time step directly based on the schedules, compute the nearest action time (either output or callback or the simulation is stopping). Then compute the time-step like we have been and as suggested by @tomchor for RK3, by taking the differences between the next action time and the current time.
Change TimeInterval so that, rather than accumulating previous_actuation_time, we instead compute something like the initialization_time and actuation_index. Then we can compute the next actuation time with t0 + T * (i + 1) --- will this reduce round off error when computing the actuation time?
Fix RK3 so it also uses differences to accumulate the substep times as suggested by @tomchor. We can also manually ensure that after the subtseps are complete the the clock time is in fact the current time plus the time-step.
Should we also add a feature to the nonhydrostaticmodel, something like the minimum time step? If the time-step is below that minimum, then rather than doing a time-step + pressure correction, we simply advance the model clock.

I wonder if, after all these changes, whether we still need a "discretized time" like I was suggesting or not.

tomchor commented 5 months ago

The difference would be either zero or very small by construction.

The difference is already very small. It's machine epsilon.

We need the difference to be exactly zero.

I meant the difference between the "original" γ³ and the locally-calculated γ³ that we'd use. The difference between the "intended" and actual Δt would be zero by construction.

tomchor commented 5 months ago

To start working on this I think we need an MWE. Mabye that's easy, just a simulation with constant time-step and output on TimeInterval which should, in theory, work perfectly.

Wouldn't the MWE I posted above work? Or do you mean a MWE that generates round-off errors at predictable times? (I cannot predict when the errors will occur in the MWE above.)

Hmm and there is one more point. Round-off error is the reason we get tiny time-steps, and we should fix that. However, that would still leave open the underlying problem, which is that the pressure correction fails for machine epsilon time-steps. So I'm wondering if in fact we should fix both issues.

I agree with this point, but I feel like I'm also missing something here. Let's say we change the pressure correction so that it works for machine epsilon. Won't the pressure gradient force still depend on Δt? That is, won't it be larger for small Δts and vice-versa? If so, this implies the pressure gradient doesn't converge with Δt and if so, how do we close a budget where the pressure term is important? (Maybe that's a question for another place also... I don't want to derail the discussion from the specific issue at hand.)

tomchor commented 5 months ago

Ok, here's a multi-pronged strategy to address this:

1. Rather than computing the next time step directly based on the schedules, compute the nearest _action time_ (either output or callback or the simulation is stopping). Then compute the time-step like we have been and as suggested by @tomchor for RK3, by taking the differences between the next action time and the current time.

2. Change `TimeInterval` so that, rather than accumulating `previous_actuation_time`, we instead compute something like the `initialization_time` and `actuation_index`. Then we can compute the next actuation time with `t0 + T * (i + 1)` --- will this reduce round off error when computing the actuation time?

I'm confused as to what T would be here. It feels like it should be simulation.Δt, but that wouldn't work for variable Δt.

3. Fix RK3 so it also uses differences to accumulate the substep times as suggested by @tomchor. We can also manually ensure that after the subtseps are complete the the clock time is in fact the current time plus the time-step.

4. Should we also add a feature to the nonhydrostaticmodel, something like the minimum time step? If the time-step is below that minimum, then rather than doing a time-step + pressure correction, we simply advance the model clock.

I think this could work. We probably would need to discuss how that would interact with min_Δt from the TimeStepWizard, no? Since they're putting two different lower boundaries (I think the one in NonhydrostaticModel would be more "powerful", correct?)

I wonder if, after all these changes, whether we still need a "discretized time" like I was suggesting or not.

If we discretize time do we even need the stuff above?

glwagner commented 5 months ago

I'm confused as to what T would be here. It feels like it should be simulation.Δt, but that wouldn't work for variable Δt.

The time interval that user specifies, TimeInterval(T).

We probably would need to discuss how that would interact with min_Δt from the TimeStepWizard, no?

I hadn't thought of that. But min_Δt is not a global constraint, it's only the min_Δt that the TimeStepWizard specifies. The simulation Δt can still be smaller.