CliMA / ClimateMachine.jl

Climate Machine: an Earth System Model that automatically learns from data
https://clima.github.io/ClimateMachine.jl/latest/
Other
453 stars 78 forks source link

DiagnosticsMachine #1596

Open yairchn opened 4 years ago

yairchn commented 4 years ago

Description

This issue lists the types of diagnostics by their algorithms, to assist a clear development of a small number of functions that could be called for different variables (or variable combinations) to produce the needed diagnostic outputs. For now we include here only diagnostic functions that are statistical manipulations of the model's prognostic and diagnostic variables. We do not list here the diagnostic variables themselves (such as temperature or vorticity).

Additional context

The goal of these diagnostics it to produce processed data at higher time resolution than that of the full model output, thus saving the storage space while providing statistical information about the state of the model at high temporal resolution.

For CLIMA Developers

yairchn commented 4 years ago

It is useful to partition the diagnostics into levels of increasing complexity:

A. Spacial manipulations at a given time (i.e. computers mean of a variable along a surface) B. Temporal and or Spacial-Temporal manipulations that depend on previous time steps (i.e. compute monthly mean) C. Coherent structures (i.e. identify a storm based on given criteria computed from the model's state-vector)

I will list these in separate comments below.

yairchn commented 4 years ago

A. Spacial manipulation

  1. One dimensional mean -compute the mean along a chosen coordinate (ζ) and return it as data on the plane (Ω) normal to that coordinate. This function receives a coordinate (ζ) and a variable (or combination of variables) as inputs and returns a 2D matrix of data of the data M(Ω, t). For example, in GCM compute the zonal mean temperature M=\bar{T}; Ω=[z,latitute].

  2. Two dimensional statistics - compute first, second and third central moments along a chosen plane (Ω) as a function of time and the coordinate normal to that plane (ζ). This function receives a coordinate (ζ) and a variable (or combination of variables) as inputs and returns a 1D profile P(ζ, t). For example, in LES vertical profiles of the horizontal mean of w: P=\bar{w}; the horizontal covariance of w and q_tot P=\bar{w'q_tot'}. In GCM zonal-vertical mean of the meridional flux of energy: P=\bar{v'e'}.

  3. Conditional statistics - compute either of the previous functions in 1, 2 but based only on grid boxes where a certain variable obtains a certain value (or sign). This might be a function that calls 1 or 2 above but receives as input also the variable that is in the condition and the condition threshold. Alternatively this could be split into two functions: one that makes conditional computations on 1 and the other makes conditional computation on 2. This function receives a variable and a condition and the inputs needed for the statistics above. We might need to compute the threshold based on the spacial statistics of the variable in questions. Example: compute the horizontal mean of w based only on grid boxes where q_liq is above its 90th percentile.

  4. Planner fractions - compute the number of all the grid boxes on the plane (Ω) that has a non-zero value of a variable and divide by the total number of grid boxes in (Ω). Produces a profile at each output time P(ζ,t) of a fraction (between 0 and 1). For example in LES cloud fraction (cld_frac), at each height z compute the number of x,y grid locations where there is any condensate (q_con>0) divided by the total number of grid boxes in the x,y plane and provide cld_frac(z,t).

  5. Domain integral - compute the 3D integral in the domain in each time to produce a time-series s(t). For example total energy.

  6. One dimensional integrals, take the integral along the coordinate ζ of a statistics of a variable (i.e. mean variance ..) or variable products in the Ω plane. Produce a time series s(t). For example liquid water path as the vertical integral of the density times the horizontal mean of q_liq, integral with z of (\rho*\bar{q_liq}).

  7. Cover: sum all the columns on the plane Ω in which at list one of the ζ (perpendicular to the plane Ω) locations a condition is met. Produce a time series s(t), of a cover (also fractional value between 0 and 1). For example cloud cover in LES: sum of x,y columns in which there is any condensate (q_con>0) at least in one its z points (i.e. a the columns covered by a cloud) and divide total number of x,y grid boxes in the horizontal plane.

yairchn commented 4 years ago

B. Temporal manipulation -these will be typically applied to diagnosed variables produce in one of the functions in A.

  1. Temporal statistics - compute time average (or deviation from the time average) of a variable (or variables) over a given time interval. For example - compute the monthly mean zonal mean zonal flow in a GCM: Use A1 with ζ as longitude (Ω I the latitude height plane) and zonal velocity u as a variable to produce \bar{u}=M(Ω, t). Take the monthly average values of this matrix month by month throughout the simulations.

  2. Interannual temporal statistics - mostly in GCM. For example compute the interannual monthly mean zonal mean zonal flow. Use B1 and take the monthly average values of this matrix month by month throughout the simulations and average these by calendar month (or months) to produce the Jan mean (or the Dec-Jan-Feb mean) zonal mean zonal flow over a 100years simulation.

blallen commented 4 years ago

cc: @glwagner @jm-c @christophernhill @sandreza @leios

smarras79 commented 3 years ago

C. For visualization (beyond debugging): For visualization purposes, horizontal or vertical 2D slices of certain prognostic and/or diagnostic quantities at a given pressure level (e.g. at 850 hPa). Instead of writing very large 3D files and extract data from them afterwards, the user should decide what slices he is interested in looking at and have the code write those and only those at runtime. This approach reduces the writing time to file (especially at very high resolutions LES ---think of the resolution tropical cyclones that Yassine is running: a 3D file is > 1GB and he is writing 1 output at every hour or less.)