Weekly seminar on deep learning for climate modeling
2018-09-04: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance
Ling, J., Kurzawski, A., & Templeton, J. (2016).
Reynolds averaged turbulence modelling using deep neural networks with embedded invariance.
Journal of Fluid Mechanics, 807, 155-166. doi:10.1017/jfm.2016.615
Context (from Mu): Navier Stokes solution can be solved by
- direct numerical simulation
- large eddy simulation (spectral cascades)
- Reynolds-averaged Navier Stokes (RANS)
- Eddy viscosity models of order 0, 1, and 2 (k-epsilon). The zero-order solutions are based upon the Boussinesq approximate
for turbulence.
- Reynolds stress models that solve the time evolution of the Reynolds stress <ui’uj’>
- Perhaps it is too complicated to explicitly solve the Reynolds stress evolution → development of algebraic stress models (ASM):
<ui’uj’> = giTi in which Ti are a 10-tensor basis. This serves as a nice test model for machine learning in fluid dynamics as
conservation laws are satisfied, no physical insight is needed; we simply want the best set of parameters gi.
Overview (from Tom):
In the article the Navier Stokes equation is solved in the Reynolds decomposition which employs a mean and perturbation to describe
a flow. The shear and rotation tensors are non-dimensionalized with turbulent kinetic energy and turbulent dissipation energy.
The notion of Galilean invariance states the solution of the flow must not be dependent on the orientation of the coordinates.
Article discussion:
- The author makes a large assumption in using the 5 invariants, i.e. that the flow is driven by eddies.
- Nine inputs are used in the MLP, exploiting some symmetries in the rotation and shear tensors.
- Did they really need 8 hidden layers for this system? How well could one do with just 2 layers?
Or with a baseline linear or nonlinear model?
- Some discussion of their Bayesian optimization for 3 hyperparameters
(2 architectural -- number of nodes per layer + number of layers,1 for learning rate)
- They use root mean squared error for their cost function. Is RMSE the right metric?
Could random search do as well or better? Could certain methods simply learn the cost function?
- Could they have had the same insight by simply augmenting their training data, i.e. rotating it many times to
build up the invariance naturally?
General discussion:
- Generative adversarial networks (GANs) can do better with less data.
- It could be interesting to come up with certain “climate canonical data sets”.
- From the discussion of turbulence modeling, it is important to choose your model cleverly. In “softer machine learning” there is an “embedded conservation”. With sufficient data, however, ML tools seem to be able to learn conservation. A means of accelerating this “conservation learning” could be to heavily penalize non-conservation by the model.
- Spectral energy fluxes could be a useful measure for the cost function. A given architecture may also be more successful in Fourier space.
- It is worthwhile to think carefully about the best data for these tools.
2018-09-11: A data driven approach to convective parameterization
Deep learning to represent subgrid processes in climate models
Stephan Rasp, Michael S. Pritchard, Pierre Gentine
Proceedings of the National Academy of Sciences Sep 2018, 201810286; DOI: https://doi.org/10.1073/pnas.1810286115
Pierre Gentine, September 11, 2018
Convective parametrization (based on Arakawa and Schubert’s idea):
Main objective: getting mass flux profiles
- Specify mass flux at cloud base(closure)
- Specify entrainment and detrainment profiles(mixing)
- Use quasi-equilibrium assumption (which is not right sometimes)
- Imply strong connection between boundary layers and convection
- Include trigger function of convection
- Problem of convective parametrization in GCMs:
- Incorrect peak diurnal cycle of Precipitation and Cloud cover, which
is in correspondent with Peak surface flux
- Wrong heating sign in heating profiles
- Incorrect timing of shallow cumulus and deep convection
- Trigger function and mass closure are not good
- Quasi-equilibrium is good for longer timescales (CAPE as an
example), but when forcing is too fast \~ diurnal cycle, it is far
from equilibrium
- In a diurnal cycle, memory of convective systems is more important
than forcing
- Entrainment depends on environmental conditions, for example, is
sensitive to humidity of environment, when RH+ mixing+
- Entrainment is a stochastic representation, not deterministic,
mixing level is random, state does not depend on cloud base but
memory
- A lot of drizzle: not enough moisture (too frequent and too
little rain) not enough extremes of rainfall
- Mesoscale convective systems (self-organized, large fraction of
rainfall and key for extreme rainfall) is not represented
- Cloud pools: rain evaporation and ice melt generate density current
and regenerate convection by pushing air back troposphere – but is
not represented
Too many biases in diurnal cycles, MCS, organization, precipitation
extremes, waves, mass flux and entrainment.
Some improvement ways:
ECMWF:
- Departure from quasi-equilibrium using PBL lag/memory
- Relax of PCAPE -> treat the memory
Deep convection
- Cold pools in PBL is one way to include triggering and closure
- Modify entrainment -> Wider less entrainment plumes
Solution A to substitute convective parametrization: CRMs do better
job(\~10km), we can embed CRMs to GCMs (Super Parametrization)
- Diurnal cycle is better and intensity is better
- Order of magnitude of precipitation is right
Explicit convection improves dramatically, SPs are doing well but too
expensive!
Questions: usually we do 1D-2D CRMs and not full 3D structures to avoid
expensive costs, will it cause problems?
Yes, momentum budget and so on… Macroscale statistics can be better
represented in 3D models.
Solution B to substitute convective parametrization: data driven
approach machine learning
How to do machine learning?
- Training data: 3-6 months, testing data: one year, long time LES
might be used as training data
- Uses T, q, Ps, H, LH, SW~TOA~ to predict their tendency,
precipitation or TOA radiative flux (Traditional way: dynamical
core + advection + turbulence + microphysics schemes…. ->
tendency)
- SPs can represent some of MCs propagation even with periodic lateral
boundary conditions
- Learning rate was the most important variable. Normalization of
variables was not necessary.
Good news in symmetric aquaplanet +4K experiment:
- Similar to SPs but smoother in machine learning model, less noise
(fit mean state)
- Better cloud radiation effect and precipitation
- Extreme precipitation number is increased
- MJO is better than cam but not as good as SPs, there is similar wave
spectra
- Heating rate in vertical levels is more realistic
- Only one peak in ITCZ instead of two peaks
- Integrate heating rate plot: Walker circulation in the right place:
even though on aqua-planet: shows some generation is possible
Limits of machine learning:
- Does not do well if too much outside of the training data, for
example, it cannot simulate warming in the future based on past
data, but, if trained with both past and future data, it is
improved. And under warming condition, extreme precipitation is
captured
- Did not show the variance well
Question: Why we overestimate extreme precipitation in low frequencies
in machine learning model?
Discussion
-
A question about conservation.
It’s very close to be conserved, we plugged the machine learning to
climate model, the model learned conserve to some extent, it is not
perfect because we do not have liquid water content, but
approximately. Condensation diagnostics are needed.
-
Stability of NN model.
-
How much faster is the machine learning model when compared with
SPs?
20 times. Training data of 3-6 months could lead to convergence.
-
How many hidden layers?
256 x 8
-
How many epochs to use? Set line for epoch or set a line for error?
When to stop the training?
Stop when reach convergence, not use dropout
-
It is not necessary to normalize the data or dimension analyze,
because it can figure it out for you.
-
Initial condition for weight is not so important
-
Learning rate is the most important parameter, and sometimes we
chose one learning rate which still leads to convergence but is
physically wrong
-
Cost function: RMSE from output vector.
-
Could be useful to check cross correlation of input variables.
-
Start use noise and shape the noise in some way, ask the model to
learn the distribution not the deterministic relationship.
Generative model. Gaussian processes might be useful.
-
What is the network architecture? The details of setup.
To be showed…
-
May train data from 3D cloud resolving models in the future
-
Use GitHub repository and Google Doc to share information.
Yu Huang
2018-09-18: Data-driven discretization: a method for systematic coarse graining of partial differential equations
Yohai Bar-Sinai, Stephan Hoyer, Jason Hickey, Michael P. Brenner
https://arxiv.org/abs/1808.04930
2018-09-25: Accelerating eulerian fluid simulation with convolutional networks
Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K. (2016).
Accelerating eulerian fluid simulation with convolutional networks.
arXiv preprint arXiv:1607.03597.
https://cims.nyu.edu/%7Eschlacht/CNNFluids.htm
Tompson et al. propose a machine learning technique to solve the invicid-Euler equation:
$$\frac{\partial u}{\partial t} = - u \cdot \nabla u - \frac{1}{\rho} \nabla p + f$$
$$\nabla \cdot u = 0$$
The motivation for this work is to improve computer graphic animations, but the method is applicable to more complicated forms of the Navier-Stokes equation.
Traditionally, the equation can solved using the operator splitting method. The method boils down to 2 steps (see algorithm 1 in the paper for more details):
- Ignore pressure gradients and calculate the velocity at the next time step assuming only advection ($\frac{\partial u}{\partial t} = -u\cdot\nabla u$)
- "pressure projection" : solve the Poisson equation for $p_t$ and use this to update the velocity field: $ut = u{t-1} - \frac{1}{\rho} \nabla p_t$
Exact solutions can be found using iterative methods such as Preconditioned Conjugate Gradient (PCG) or Jacobi method. These are iterative methods that only can be divergent if truncated before convergence is reached, leading to bad solutions. The method proposed by Tompson et al. uses an unsupervised learning method to update the velocity field. A convolution network is used to estimate $p_t$. Instead of using training data, the method minimizes the divergence of the predicted velocity field. This is justified since the problem assumes a non-divergent field ($\nabla \cdot =0$).
3D Smoke plumes were simulated using the proposed method and three other methods. PCG, Jacobi, and the proposed method can produce qualitatively similar results. The Jacobi method, when truncated early, has an elongated shape and produced high frequency noise. To test the stability of the methods, the authors calculated the mean of the velocity divergence (which should be nearly zero). The proposed method significantly outperforms the Jacobi Method
In summary, the proposed method uses an unsupervised convolution network to solve the invicid-Euler equation. Although it does not guarantee an exact solution, it significantly out performs the Jacobi method and produces results similar to PCG, while being orders of magnitude faster.
L. Gloege
2018-10-02: Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach
Jaideep Pathak, Brian Hunt, Michelle Girvan, Zhixin Lu, and Edward Ott
Phys. Rev. Lett. 120, 024102 – Published 12 January 2018
https://doi.org/10.1103/PhysRevLett.120.024102
2018-10-09: Recurrent neural networks and empirical dynamical modeling to study non-linear dynamical systems
D. Sussillo and O. Barak. Opening the Black Box: Low-dimensional dynamics in high-dimensional recurrent neural networks. Neural comput. 2013 25(3): 626-49. doi: 10.1162/NECO_a_00409.
and
G. Sugihara, R. May, H. Ye, C.-h. Hsieh, E. Deyle, M. Fogarthy and S. Much. Deteching causality in complex ecosystems. Science. 2012 338(6106): 496-500. doi: 10.1126/science.1227079.