Some concerns about the Klemp (2007) equation set and suggestions for transitioning to a moist model

ali-ramadhan / Atmosfoolery.jl

Compressible non-hydrostatic model built on top of Oceananigans.jl so it runs on CPUs and GPUs

MIT License

3 stars 0 forks source link

Some concerns about the Klemp (2007) equation set and suggestions for transitioning to a moist model #49

Closed thabbott closed 4 years ago

thabbott commented 4 years ago

I want to discuss what I see as a potential problem with following the Klemp (2007) paper exactly (which is where JULES.jl currently seems to be heading) as we work on converting from a dry to a moist model, which is that the way Klemp (2007) uses dry density prevents its equation set from working in a limit where the entire atmosphere is made of water. This is admittedly a pretty academic problem, but for the sake of flexibility I would rather use an equation set that doesn't break down for extremely condensible-rich atmospheres.

My suggestions is that, moving forward, we use the Klemp (2007) paper as a useful guide but base our model a slightly different set of equations. Something like equations 24-27 and 31-32 from Satoh (2003) (plus an appropriate thermodynamic equation) would be a more robust set of governing equations, since they still make sense even with zero dry density. The key differences from Klemp (2007) are that Satoh (2003)

tracks the mass of water species as concentrations relative to total mass rather than mixing ratios relative to dry mass, and
defines ρu as the total mass flux, not just the flux of dry mass

One intermediate goal we could work toward that would give us a chance to start playing with equation sets with multiple massive species but without all of the complexity of moist thermodynamics would be to work on writing code that integrates a set of equations for two non-condensible ideal gases (call them a and b):

$\frac{\partial \rho_a}{\partial t} + \nabla \cdot \left(\rho_a {\bf u} \right ) = D_a \\ \frac{\partial \rho_b}{\partial t} + \nabla \cdot \left(\rho_b {\bf u} \right ) = D_b \\ \frac{\partial \rho {\bf u}}{\partial t} + \nabla \cdot \left(\rho {\bf u} {\bf u} \right ) = -\nabla p - \rho g \hat{z} + {\bf F} \\ \frac{\partial \rho s}{\partial t} + \nabla \cdot \left(\rho s {\bf u} \right ) = 0 \;\;(???) \\ \rho = \rho_a + \rho_b \\ p = p_a + p_b = (\rho_a R_a + \rho_b R_b)T$

(I'm using the notation from the Satoh paper---so the Ds are diffusive fluxes, and the F is viscous dissipation. I've put question marks on the RHS of the entropy equation since I need to think more about what should go there---presumably a source term corresponding the momentum sink and probably a mixing terms of the two gases aren't identical.)

This would give us a chance to check the robustness of the equation set (e.g., does everything still work with ρ_a = 0?) and play with how best to implement a model with multiple massive species in JULES, all without having to deal with a ton of complicated thermodynamic bookkeeping. If you think this sounds like a reasonable next step I can work on closing this equation set (and replacing entropy with some other thermodynamic variable, if you'd prefer) to give us something you can actually work on implementing.

ali-ramadhan commented 4 years ago

I want to discuss what I see as a potential problem with following the Klemp (2007) paper exactly (which is where JULES.jl currently seems to be heading

True. I guess I did it that way as it's the only equation set I know of right now and it seems to perform well enough in WRF, but at this early stage is probably where we want to make the right design choices so that we can upgrade/modify the equation set in the future.

which is that the way Klemp (2007) uses dry density prevents its equation set from working in a limit where the entire atmosphere is made of water. This is admittedly a pretty academic problem, but for the sake of flexibility I would rather use an equation set that doesn't break down for extremely condensible-rich atmospheres.

Ah I did not realize this. If modeling condensible-rich atmospheres is of interest in other fields (modeling the atmosphere of Venus?) then it seems useful to pursue. It was easy to use Oceananigans.jl to model Europa's ocean so hopefully the same can be done with JULES.jl. Apparently existing models can be pretty hard to adapt to modeling anything other than Earth.

One intermediate goal we could work toward that would give us a chance to start playing with equation sets with multiple massive species but without all of the complexity of moist thermodynamics would be to work on writing code that integrates a set of equations for two non-condensible ideal gases (call them a and b)

Sounds like a good idea! From skimming Satoh (2003) I don't think it'll be a huge change as we can still use the RK3 time-stepping from Klemp et al. (2007). It also sounds like Klemp et al. (2007) and Satoh (2003) take a pretty similar approach to acoustic time stepping.

It looks like the prognostic variables would become ρᵃ, ρᵇ ρu, ρv, ρw, ρs (no need for qᵃ and qᵇ tracers I'm guessing?).

My only concern is memory usage (which can be quite limiting on GPUs) but exchanging a tracer variable for an extra density variable means it'll probably use the same amount of memory.

If you think this sounds like a reasonable next step I can work on closing this equation set (and replacing entropy with some other thermodynamic variable, if you'd prefer) to give us something you can actually work on implementing.

Sounds like a good plan! I'm not familiar with atmospheric modeling or moist thermodynamics but given an equation set we have a framework for time-stepping it on a staggered grid so I'm happy to implement things.

I have my own goals with the project but I think our goals align in multiple ways. Would be worth discussing what to focus our efforts on.

I'd be in favor of supporting at least two thermodynamic variables so we develop an abstraction for switching between them, which should make the model more flexible if someone wants to add another thermodynamic variable in the future. Potential temperature and entropy seem like a good start.

But we should start with just one. Entropy works for me.

If there is novelty in combining/extending the Klemp et al. (2007) and Satoh (2003) approaches + running LES efficiently on GPUs (potentially multi-GPUs) with high-order advection schemes then perhaps there are two papers here?

A JOSS paper (software novelty), and
a modeling paper akin to the PyCLES paper (Pressel et al., 2015) (scientific novelty and showing the model is faster/better/more accurate/more user friendly than existing models?).

thabbott commented 4 years ago

True. I guess I did it that way as it's the only equation set I know of right now and it seems to perform well enough in WRF, but at this early stage is probably where we want to make the right design choices so that we can upgrade/modify the equation set in the future.

Yeah, that's totally reasonable (and to be fair, it's the paper that Raphael and I pointed you to initially)---just figured that now was a good time to bring up some ways in which we might want to depart from it.

Ah I did not realize this. If modeling condensible-rich atmospheres is of interest in other fields (modeling the atmosphere of Venus?) then it seems useful to pursue. It was easy to use Oceananigans.jl to model Europa's ocean so hopefully the same can be done with JULES.jl. Apparently existing models can be pretty hard to adapt to modeling anything other than Earth.

Venus's atmosphere is mostly non-condensible CO2, but idealized simulations of the Martian atmosphere (mostly CO2 that condenses at the poles) or some icy moons (https://iopscience.iop.org/article/10.3847/1538-4357/aae38c/pdf) might be of interest.

Sounds like a good idea! From skimming Satoh (2003) I don't think it'll be a huge change as we can still use the RK3 time-stepping from Klemp et al. (2007). It also sounds like Klemp et al. (2007) and Satoh (2003) take a pretty similar approach to acoustic time stepping.

It looks like the prognostic variables would become ρᵃ, ρᵇ ρu, ρv, ρw, ρs (no need for qᵃ and qᵇ tracers I'm guessing?).

My only concern is memory usage (which can be quite limiting on GPUs) but exchanging a tracer variable for an extra density variable means it'll probably use the same amount of memory.

Yes, the changes should be pretty minor. There's not really a different (beyond notation) between using ρᵃ and qᵃ since for the latter case the prognostic conservative variable is ρqᵃ = ρᵃ, and memory usage should be the same (since you don't need the concentration variable if you have density). Using ρᵃ instead of ρqᵃ might make diagnosing ρ slightly easier but either is a perfectly fine choice.

Sounds like a good plan! I'm not familiar with atmospheric modeling or moist thermodynamics but given an equation set we have a framework for time-stepping it on a staggered grid so I'm happy to implement things.

Alright, I'll work on closing the equations.

I have my own goals with the project but I think our goals align in multiple ways. Would be worth discussing what to focus our efforts on.

OK, good! If some of the things that I want to do start to deviate from what you're interested in you can always tell me that you want to focus on other things.

I'd be in favor of supporting at least two thermodynamic variables so we develop an abstraction for switching between them, which should make the model more flexible if someone wants to add another thermodynamic variable in the future. Potential temperature and entropy seem like a good start.

But we should start with just one. Entropy works for me.

Yeah, I agree that's a useful abstraction to have. Entropy vs. internal energy might be a more interesting distinction than entropy vs potential temperature (and Satoh (2003) conveniently gives an internal energy equation) but we can discuss that more later.

If there is novelty in combining/extending the Klemp et al. (2007) and Satoh (2003) approaches + running LES efficiently on GPUs (potentially multi-GPUs) with high-order advection schemes then perhaps there are two papers here?

A JOSS paper (software novelty), and

a modeling paper akin to the PyCLES paper (Pressel et al., 2015) (scientific novelty and showing the model is faster/better/more accurate/more user friendly than existing models?).

I think there's definitely scientific novelty in having a non-hydrostatic model that works for purely-condensible atmospheres and can choose a thermodynamic variable that allows either energy or entropy to be conserved. Another scientific strength that we might think about emphasizing is the ability to re-use model code during postprocessing to close budgets to numerical precision.

thabbott commented 4 years ago

Here's a closed set of two-component equations. Looking forward to the working demo when we meet in 20 minutes 😛

(But actually I'm happy to take a stab at getting them working if you have other things you need to do.)

two-component-equations.pdf

ali-ramadhan commented 4 years ago

That doesn't look too bad! Shouldn't be hard to refactor the existing code to allow for two non-condensible ideal gases. Should even be possible to do n gases, although might not be super performant for large n. Could be nice to have an abstract for n gases that works wells for 1 <= n <~ 4.

I'm actually thinking now that it might be better to pair program these changes in, especially as I don't think they'll be huge refactors, and would give us the opportunity to review/improve the abstractions.

thabbott commented 4 years ago

I have a working n-gas implementation (it was pretty straightforward, like you said) but it seems not to be very performant (much slower than the previous version even for n = 1). I did a little profiling and the code for diagnosing pressure seems like the bottleneck.... I'll see if I can improve it but might come see you if not. (I also played a little bit with cleaning up other abstractions and agree that it would be good to go through them together soon.)

On Thu, Feb 6, 2020, 9:34 PM Ali Ramadhan notifications@github.com wrote:

That doesn't look too bad! Shouldn't be hard to refactor the existing code to allow for two non-condensible ideal gases. Should even be possible to do n gases, although might not be super performant for large n. Could be nice to have an abstract for n gases that works wells for 1 <= n <~ 4.

I'm actually thinking now that it might be better to pair program these changes in, especially as I don't think they'll be huge refactors, and would give us the opportunity to review/improve the abstractions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thabbott/JULES.jl/issues/49?email_source=notifications&email_token=ACZDSTWQXDFUJZNN47MLRBLRBTCD3A5CNFSM4KOVYX52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELBQM6Y#issuecomment-583206523, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZDSTUJGW4AICRVVKI55ALRBTCD3ANCNFSM4KOVYX5Q .

thabbott commented 4 years ago

(Did some more digging and think now that my changes might have introduced some type stability issues---allocations have gone up from ~25k to ~150M per 50 time steps in the thermal bubble test case....). I pushed the changes on thabbott/multiple_components if you have time to take a look and see if anything suspicious jumps out at you.

ali-ramadhan commented 4 years ago

Ah yeah that sounds like something's being allocated at every grid point at every time step.

Looking at diagnose_p in the thabbott/multiple_components branch it's possible that inside it and inside the call to diagnose_T there are memory allocations occurring as the compiler can't optimize them away.

Probably does not help that they're written to work for an arbitrary number of gases. I don't have a good feeling for what performs well here and what doesn't but even in Julia Base they sometimes do "special" things when you know the size of the tuple: https://github.com/JuliaLang/julia/blob/master/base/ntuple.jl#L17-L31

I'll try some things out on your branch.

I'd be curious to see whether assuming n = 2 (or just making the loop length more explicit) will allow the compiler to unroll and inline and make the code not allocate any memory.

Benchmarking a 32³ model shows that on master

 ──────────────────────────────────────────────────────────────────────────────────
    Static atmosphere benchmarks           Time                   Allocations      
                                   ──────────────────────   ───────────────────────
         Tot / % measured:              25.0s / 1.36%           6.73GiB / 0.00%    

 Section                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────
 32×32×32 [CPU, Float64]       10    340ms   100%  34.0ms    258KiB  100%   25.8KiB
 ──────────────────────────────────────────────────────────────────────────────────

so we should be able to get similar results. But on thabbott/multiple_components it's

 ──────────────────────────────────────────────────────────────────────────────────
    Static atmosphere benchmarks           Time                   Allocations      
                                   ──────────────────────   ───────────────────────
         Tot / % measured:              14.0s / 83.2%           3.68GiB / 83.1%    

 Section                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────
 32×32×32 [CPU, Float64]       10    11.7s   100%   1.17s   3.06GiB  100%    314MiB
 ──────────────────────────────────────────────────────────────────────────────────

~1.17 s or ~35 times slower with tons of allocations.

Side note: Oceananigans 32³ on CPU is ~8 ms so there may be some performance optimizations we can make but I think RK3 is the reason why JULES takes more time (can take longer time steps so time to solution is perhaps what we should be comparing but these benchmarks are mostly to ensure regressions don't occur).

Also, should it be ρ = densities[n].data[i,j,k] instead of ρ = tracers[n].data[i,j,k]?

thabbott commented 4 years ago

Right now the model uses three named tuples to store information about tracers: model.densities stores gas constants, specific heats, etc. for each species model.thermodynamic_variable keeps track of what thermodynamic variable we're using, and model.tracers stores actual values for the densities, thermodynamic variable, and any other tracers.

Using named tuples seemed convenient because it lets us use property names to match information in model.densities and model.thermodynamic_variable to the corresponding fields in model.tracers, but maybe this isn't the best choice for performance. (And maybe it's worth renaming model.densities to something different, e.g. model.species, since that field doesn't actually store densities.)

Also, is it possible to have the benchmark you quoted above run automatically on travis and/or appveyor? It would be nice to get some warning automatically if code we (or more likely I) write introduces similar performance regressions in the future.

thabbott commented 4 years ago

Once issue I noticed: ATM Julia can't fully infer the type of F_C on line 25 or R_C or F_C on lines 52-53 of time_stepping_kernels.jl. I assume this is because model.slow_forcings and model.right_hand_sides both contain more than one type of field and Julia doesn't know until runtime which type the getproperty call on the preceding lines will return. But this doesn't seem like the source of the performance regression, since the code in master (presumably) has the same issue.

Regarding the new pressure gradient calculations: one call to RU now results in 8 allocations, which goes down to 0 if I comment out the call to ∂p∂x. So you were right about those changes being a source of new memory allocations, and I think this is probably why the new code runs slowly.

We might need to avoid for key in keys(densities) within functions that are called at each grid point---the iterator returns a Union{Nothing, Tuple(Int64,Symbol)} that gets compared to nothing to determine whether to stop iteration, and I'm not sure whether this can be done without allocating memory.

thabbott commented 4 years ago

I pushed some changes that make the loop length more explicit and, in fact, that is enough to get rid of the allocations in the pressure gradient calculations. Unfortunately this only cuts the total allocations in half---see the new benchmark below---so something else is a partial culprit for the regression. (My benchmark runs much more slowly than yours, but presumably that's just from differences between computers or compiler optimization level or something like that.)

──────────────────────────────────────────────────────────────────────────────────
    Static atmosphere benchmarks           Time                   Allocations
                                   ──────────────────────   ───────────────────────
         Tot / % measured:               282s / 25.4%           2.10GiB / 77.1%

 Section                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────
 32×32×32 [CPU, Float64]       10    71.6s   100%   7.16s   1.62GiB  100%    166MiB
 ──────────────────────────────────────────────────────────────────────────────────

thabbott commented 4 years ago

... and did a little bit of reorganizing to improve type inference (and to remove one for for key in keys(densities)), which gets rid of about 80% of the remaining allocations. The remaining allocations all come from computing slow forcings for tracers (line 28 of time_stepping_kernels.jl and line 28 of right_hand_sides.jl), I think because of issues with type inference for Val(tracer_index).

──────────────────────────────────────────────────────────────────────────────────
    Static atmosphere benchmarks           Time                   Allocations
                                   ──────────────────────   ───────────────────────
         Tot / % measured:              18.9s / 16.8%           7.13GiB / 4.25%

 Section                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────
 32×32×32 [CPU, Float64]       10    3.17s   100%   317ms    310MiB  100%   31.0MiB
 ──────────────────────────────────────────────────────────────────────────────────

thabbott commented 4 years ago

... and finally, replacing Val(tracer_index) with tracer_index gets us back to a master-like number of allocations! (Was there a reason you wanted to dispatch on the value of tracer_index?)

──────────────────────────────────────────────────────────────────────────────────
    Static atmosphere benchmarks           Time                   Allocations
                                   ──────────────────────   ───────────────────────
         Tot / % measured:              15.6s / 2.85%           6.77GiB / 0.00%

 Section                   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────
 32×32×32 [CPU, Float64]       10    444ms   100%  44.4ms    288KiB  100%   28.8KiB
 ──────────────────────────────────────────────────────────────────────────────────

Anyway, this seems like reasonable performance but the code's become a bit fragile (I assume things about the ways that tracers are ordered in model.tracers that aren't enforced or checked) so it's probably a good time to sit down together and think about ways to make the abstractions more robust.