What constitutes the 'state' of the model at a particular timestep

Thanks for the fantastic work on this package!

We are interested in using SpeedyWeather with a package ParticleDA.jl we are developing for performing particle filtering / data assimilation. We have an existing integration with the original FORTRAN SPEEDY implementation but as this is a bit brittle and difficult to deploy, and lacking in flexibility compared to SpeedyWeather, it would be great if we could instead switch to using SpeedyWeather instead.

I've been doing some initial explorations of how we would do this, but I'm not sure if my mental model for how the state of models in SpeedyWeather is encapsulated is correct from digging through the source code a bit more. ParticleDA assumes the model of interest is specified as a state space model , with the state of the model at a particular time step fully determined by a vector of some fixed dimensionality. I have been working on the assumption that the state of SpeedyWeather models is fully defined by the spectral coefficients in prognostic_variables for leapfrog index lf=1, so that a (deterministic) update to a state vector state according to the model could be performed something like

update_prognostic_variables_from_state_vector!(prognostic_variables, state)
run_speedy!(prognostic_variables, diagnostic_variables, model_setup)
update_state_vector_from_prognostic_variables!(state, prognostic_variables)

where update_prognostic_variables_from_state_vector! and update_state_vector_from_prognostic_variables! deal with mapping from the (complex) coefficients in the lower triangular matrices in a PrognosticVariables struct to a flat real vector and vice versa, and prognostic_variables, diagnostic_variables and model_setup correspond to the initialized data structures returned by initialize_speedy. This assumes that the diagnostic_variables and model_setup structures are only used to read constants from, write output to not used in subsequent time steps and for holding buffers used for writing intermediate results to. In particular I am assuming diagnostic_variables is fully determined by the value of prognostic_variables passed to run_speedy!, but is passed as an input to avoid the need for allocating on each call.

The code elements which I've come across so far which make me unsure about the interpretation above:

The current simulation time is tracked in a Clock struct in the model_setup struct, and this is updated as part of the timestepping so that we would need to manually reset the clock data to have run_speedy! perform the same computations for a given state. So the model_setup struct is at least partially stateful.
The leapfrog algorithm used for timestepping requires values of the prognostic variables at two initial time points to be specified, and the PrognosticVariables struct therefore has memory allocated for defining all spectral coefficients at two timesteps. I think based on reading the code in src/dynamics/time_integration.jl and src/dynamics/initial_conditions.jl that only the values corresponding to index lf=1 need to be initialised, with the values for the subsequent lf=2 index bootstrapped by performing an initial pair of forward Euler and (unfiltered) leapfrog steps, so that the lf=2 values in the PrognosticVariables struct are effectively only an internal buffer that is always overwritten at the start of timestepping. Is this correct?
After the initial time steps, subsequent steps use a filtered leapfrog algorithm. As far as I can tell this relies on tendencies stored in the DiagnosticVariables struct for each of the quantities represented in PrognosticVariables. From https://github.com/SpeedyWeather/SpeedyWeather.jl/blob/d00b8017eecf2da25ad1801873d7fab09d6ab221/src/dynamics/time_integration.jl#L173-L191 it looks like the tendencies are zeroed at the beginning of each timestep. Does this mean it's safe to assume that any tendencies values written to a DiagnosticVariables struct in a previous time step will not be accessed / can be safely ignored?

As a related question, is it safe to assume a call to run_speedy! is deterministic for fixed values of the prognostic and diagnostic variables and model setup? The only places I can currently identify where a random number generator is used is in the initial_conditions! method associated with StartWithRandomVorticity. It looks like from #199 this may change at some point in the future?

Hi Matt, thanks for reaching out! I've seen your ParticleDA.jl efforts and would love to collaborate. I think I've talked with one of you also at last year's JuliaCon in London. Great to see that there's progress!!

1) v0.5 vs v0.6: I see that you have been using the current release v0.5, however could I ask you to switch to the main branch? I'll release it soon, just adding some documentation these days. Reason is that with v0.6 the interface to changes, as we want to deprecate the run_speedy!/initialize_speedy! interface. You can see examples of the new interface in How to run SpeedyWeather.jl

2) But good news is that the underlying logic you are worried about regarding prognostic/diagnostic variables and the model struct isn't changing: As you correctly figured out (note to self to document this better!) we have (with small exceptions see below) a) the prognostic variables define the current state of the model, the diagnostic variables are entirely determined by the prognostic variables and the model b) the model is constant at runtime c) Unless the initital conditions are chosen after the model creation the model can reconstruct the prognostic variables

So to your point of "make me unsure" that's indeed as you describe.

However, the exceptions are as follows

1) Clock (which is only in v0.6 btw) sits in model and counts up the time, at the moment this is only used for output. There are conceptually some parameterizations like radiation that would use Clock to determine things like time of day, radiation at that moment and therefore a temperature tendency so that indeed Clock can influence the prognostic variables. This sounds like a good reason to me to move the Clock into the prognostic variables, but if you are worried about this for now: there's currently no active parameterization that uses Clock and so it's only used for output purposes. Going forward, there'll always be a way to disable that, e.g. the radiation code will be able to have daily_cycle=false, seasonal_cycle=false such that the Clock state doesn't matter for the future evolution of the prognostic variables.

2) leapfrogging: Yes I agree with what you say. At the beginning of the model integration only lf=1 matters and lf=2 is overwritten due to the Euler forward step. During the model integration (i.e. for each time step after the first) both lf=1 and lf=2 determine the future state of the prognostic variables.

3) Yes, the idea is that information from the diagnostic variables propagates from one time step to the next only through the prognostic variables. The tendencies are completely overwritten every time. I do want to stick to that logic and there's currently only two places where we have to be careful (both of which can be disabled): a) Horizontal diffusion: At the moment there's an option to have HyperDiffusion(;spectral_grid,adaptive=true) which means that based on the current state of the diagnostic variables, the hyper diffusion "constants" are recalculated to allow for a more selective diffusion which therefore would impact the prognostic variables. But at the moment this does not leave any information left in diagnostic variables/model and is done every time step. b) Semi-implicit time stepping. This requires a vertical temperature profile, which can be chosen as constant ImplicitPrimitiveEq(;spectral_grid,recalculate=typemax(Int)) (reminder to self to allow Inf to be passed on too), or can occasionally be recalculated. When recalculated there would be information propagating from a previous diagnostic state to the model (where the .implicit sits) and then back to the prognostic variables.

On random initial conditions/stochastic parameterizations: At the moment the only random aspect of the model are the StartWithRandomVorticity initial conditions for the barotropic model. There are some plans to make other parameterizations stochastic, that there'll always be an option to disable that.

Thanks for the detailed and very helpful response @milankl :grin:

I think I've talked with one of you also at last year's JuliaCon in London. Great to see that there's progress!!

Yep I think @DanGiles may have chatted to you.

v0.5 vs v0.6: I see that you have been using the current release v0.5, however could I ask you to switch to the main branch? I'll release it soon, just adding some documentation these days. Reason is that with v0.6 the interface to changes, as we want to deprecate the run_speedy!/initialize_speedy! interface. You can see examples of the new interface in How to run SpeedyWeather.jl

Okay yep, will switch to using interface on main branch; I'd noticed this in the process of changing but wasn't sure how imminent a new release would be.

But good news is that the underlying logic you are worried about regarding prognostic/diagnostic variables and the model struct isn't changing: As you correctly figured out (note to self to document this better!) we have (with small exceptions see below) a) the prognostic variables define the current state of the model, the diagnostic variables are entirely determined by the prognostic variables and the model b) the model is constant at runtime c) Unless the initital conditions are chosen after the model creation the model can reconstruct the prognostic variables

So to your point of "make me unsure" that's indeed as you describe.

Thanks - that's all super helpful to confirm. And it's a credit to how readable your code is that I was able to figure some of this out without it being explicitly documented 🙂

However, the exceptions are as follows

Clock (which is only in v0.6 btw) sits in model and counts up the time, at the moment this is only used for output. There are conceptually some parameterizations like radiation that would use Clock to determine things like time of day, radiation at that moment and therefore a temperature tendency so that indeed Clock can influence the prognostic variables. This sounds like a good reason to me to move the Clock into the prognostic variables, but if you are worried about this for now: there's currently no active parameterization that uses Clock and so it's only used for output purposes. Going forward, there'll always be a way to disable that, e.g. the radiation code will be able to have daily_cycle=false, seasonal_cycle=false such that the Clock state doesn't matter for the future evolution of the prognostic variables.

That's helpful. Having time-dependent dynamics like this would not strictly be a problem for us as we allow for the state updates to depend on the current time index, but from an implementation perspective it would ease things for us if we can assume that we can reuse the same model setup struct on multiple states within an ensemble without needing to reset the clock state, so if the Clock struct was moved to the prognostic_variables that would definitely be helpful for us.

leapfrogging: Yes I agree with what you say. At the beginning of the model integration only lf=1 matters and lf=2 is overwritten due to the Euler forward step. During the model integration (i.e. for each time step after the first) both lf=1 and lf=2 determine the future state of the prognostic variables.

👍

Yes, the idea is that information from the diagnostic variables propagates from one time step to the next only through the prognostic variables. The tendencies are completely overwritten every time. I do want to stick to that logic and there's currently only two places where we have to be careful (both of which can be disabled): a) Horizontal diffusion: At the moment there's an option to have HyperDiffusion(;spectral_grid,adaptive=true) which means that based on the current state of the diagnostic variables, the hyper diffusion "constants" are recalculated to allow for a more selective diffusion which therefore would impact the prognostic variables. But at the moment this does not leave any information left in diagnostic variables/model and is done every time step. b) Semi-implicit time stepping. This requires a vertical temperature profile, which can be chosen as constant ImplicitPrimitiveEq(;spectral_grid,recalculate=typemax(Int)) (reminder to self to allow Inf to be passed on too), or can occasionally be recalculated. When recalculated there would be information propagating from a previous diagnostic state to the model (where the .implicit sits) and then back to the prognostic variables.

Thanks - good to know about these cases, will look at either explicitly disabling or adding some assertions to check these options aren't used so we can safely assume there is no flow of model state through diagnostic variables.

On random initial conditions/stochastic parameterizations: At the moment the only random aspect of the model are the StartWithRandomVorticity initial conditions for the barotropic model. There are some plans to make other parameterizations stochastic, that there'll always be an option to disable that.

Again thanks for the clarification!

From my perspective happy to close this issue as you've resolved all my queries, but from labels and comments above looks like you might want to keep this open as a prompt for adding documentation on some of this, so will leave open for now.

I think with #350 closed we can also close this here now! If there's anything else on this topic, feel free to reopen!

SpeedyWeather / SpeedyWeather.jl

What constitutes the 'state' of the model at a particular timestep #347