CliMA / ClimaCore.jl

CliMA model dycore
https://clima.github.io/ClimaCore.jl/dev
Apache License 2.0
86 stars 8 forks source link

Implicit Solver Interface Updates #1230

Closed dennisYatunin closed 8 months ago

dennisYatunin commented 1 year ago

Purpose

The interface we use to specify matrices and solve linear systems of equations for the implicit solver needs to be refactored and extended. This interface, which is currently spread across several files in ClimaCore and ClimaAtmos (ClimaCore.jl/src/Operators/stencilcoefs.jl, ClimaCore.jl/src/Operators/pointwisestencil.jl, ClimaCore.jl/src/Operators/operator2stencil.jl, ClimaAtmos.jl/src/tendencies/implicit/wfact.jl, and ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl) is unnecessarily convoluted and involves a large number of hardcoded assumptions, which makes it extremely challenging for new users to experiment with the implicit solver and to add new implicit tendencies. In particular, the next stage of EDMF development will involve adding many implicit tendencies to the atmosphere model, and our approximation of the total implicit tendency's Jacobian matrix will end up with a highly non-trivial sparsity pattern. In order for more than a single developer to be able to understand and implement the implicit solver for the dycore+EDMF, we will first need to make the changes outlined in this SDI. In addition, the land modeling team has been experimenting with their own implicit solver, and these changes will speed up their development and add the new functionality they require.

There is currently a draft PR that contains a detailed sketch of all the proposed changes: #1190

Cost/benefits/risks

The cost/risk is development time. The benefit will be a significantly reduced complexity of implicit solver implementations, both in ClimaAtmos and in ClimaLSM. There will be a simple, well-documented interface to all of the numerical algorithms required by implicit solvers, which will allow user-facing code to be relatively short and easily extensible.

Producers

@dennisYatunin

Components

Inputs

BandMatrixRow and MultiplyColumnwiseBandMatrixField

The type we currently use to represent an element of a band matrix field is called StencilCoefs, and it is extremely confusing and poorly designed. Upon refactoring, this will become BandMatrixRow, which will have the following improvements:

These improvements will also be reflected in fields of BandMatrixRows, which will be aliased as ColumnwiseBandMatrixFields for dispatch. (The alias name is meant to indicate that every column has its own set of BandMatrixRows, which, when taken together, can be interpreted as a band matrix.) So, for example, users will be able to write (@. LinearAlgebra.I / 2 - tridiagonal_matrix_field + 2 * pentadiagonal_matrix_field) == (@. PentadiagonalMatrixRow(field1, field2, field3, field4, field5)).

The operators we currently use for matrix-matrix and matrix-vector multiplication are Operators.ComposeStencils() and Operators.ApplyStencil(), respectively. Again, these are confusing and implemented in a rather roundabout way. Upon refactoring, these will both become , which is an alias for Operators.MultiplyColumnwiseBandMatrixField(). This will allow users to write something like @. matrix_field1 ⋅ matrix_field2 ⋅ field, instead of needing to write @. apply(compose(matrix_field1, matrix_field2), field). In addition, the amount of code used to implement matrix multiplication can be reduced roughly by a factor of 3 (as shown in the sketch), and this simplified code will be easier to update for GPUs in the near future.

FiniteDifferenceOperatorTermsMatrix

When a finite difference operator is applied to a field (@. op(field)), the result is equivalent to multiplying some matrix by that field (@. matrix_field ⋅ field). The operator we currently use to generate this matrix is Operators.Operator2Stencil(op); in order to clarify what this operator is doing, it will be renamed to Operators.FiniteDifferenceOperatorTermsMatrix(op). If op_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(op) and ones_field = ones(axes(field)), users will be able to confirm that (@. op(field)) == (@. op_matrix(ones_field) ⋅ field). As a quirk of our implementation, it is also the case that (@. op_matrix(ones_field) ⋅ field) == (@. op_matrix(field) ⋅ ones_field), which allows us to somewhat simplify expressions involving products with operator matrices.

Aside from the name change, there are two new features that we need to add to FiniteDifferenceOperatorTermsMatrix. First, for EDMF development, we need to add support for multi-argument operators, so that FiniteDifferenceOperatorTermsMatrix(op) will always generate the matrix that corresponds to the last argument of op. For example, given a two-argument operator op (such as WeightedInterpolateF2C or Upwind3rdOrderBiasedProductC2F), users will be able to define op_matrix and confirm that (@. op(field1, field2)) == (@. op_matrix(field1, ones_field2) ⋅ field2) == (@. op_matrix(field1, field2) ⋅ ones_field2).

Second, for land model development, we need to add support for specifying the corner elements of matrices by adding special boundary conditions for FiniteDifferenceOperatorTermsMatrix. The simple expression presented earlier, (@. op(field)) == (@. op_matrix(ones_field) ⋅ field), is only true when op has trivial boundary conditions; i.e., when op is a center-to-face operator with boundary conditions that cause it to return 0 on the top and bottom faces, or when op is a face-to-center operator without any boundary conditions (which means that it uses the values at the top and bottom faces as-is). In general, operators are affine transformations at the boundaries, not linear transformations. This means that, for every op and field, there is some boundary_field that is zero everywhere except at the boundaries, and (@. op(field)) == (@. op_matrix(ones_field) ⋅ field + boundary_field). However, we only use matrices to represent derivatives of operators with respect to their inputs, and, up until now, it has always been the case that boundary_field is a constant that does not depend on field. More specifically, if boundary_matrix_field represents ∂(boundary_field)/∂(field), then ∂(@. op(field))/∂(field) can be expressed as @. op_matrix(ones_field) + boundary_matrix_field. So, if boundary_field is a constant, then boundary_matrix_field is the zero matrix and can be ignored in our computations. This amounts to assuming that the corner elements of our matrices are always zero. Due to new requirements from the land model, we will no longer be able to make this assumption, so we will need to add support for the Operators.SetValue boundary condition for FiniteDifferenceOperatorTermsMatrix, which will allow users to specify nonzero elements from boundary_matrix_field that should be added to @. op_matrix(ones_field).

Additionally, it would be good to refactor how finite difference operators are implemented so that the code for every operator does not need to be duplicated in order to implement the FiniteDifferenceOperatorTermsMatrix for that operator. Currently, every operator implements a method for stencil_interior, stencil_left_boundary, and stencil_right_boundary, and each of these methods returns some value (the value of the operator's result at a particular point in space). The FiniteDifferenceOperatorTermsMatrix for every operator also implements the same three methods, but each of its methods returns a BandMatrixRow (or, rather, a "StencilCoefs") whose entries add up to the value returned by the operator's corresponding method. There are currently almost 700 lines of code required to implement these duplicated methods, and more will need to be added in order to support multi-argument operators. It should be fairly straightforward to refactor things so that:

However, this last change would only improve things "under the hood", without any immediate benefit to users. In order to avoid unnecessarily delaying EDMF and land model development, this change will be the last step of this SDI.

ColumnwiseBlockMatrix

The type we currently use to represent block matrices is the SchurComplementW object, which is hardcoded to only work for the dycore in ClimaAtmos. This can be generalized to a ColumnwiseBlockMatrix, which will be a simple dictionary that maps pairs of field names (one name for the row and another name for the column) to their corresponding matrix blocks, each of which can be a ColumnwiseBandMatrixField or a LinearAlgebra.UniformScaling. The ColumnwiseBlockMatrix will be used as follows:

matrix = ColumnwiseBlockMatrix(
    @block_name(c.ρ, c.ρ) => -LinearAlgebra.I,
    @block_name(c.ρ, f.w.components.data.:1) =>
        similar(Y.c, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, c.ρ) =>
        similar(Y.f, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, f.w.components.data.:1) =>
        similar(Y.f, TridiagonalMatrixRow{FT}),
)
matrix[@block_name(f.w.components.data.:1, f.w.components.data.:1)] .=
    -LinearAlgebra.I .+ Δtγ .* <Jacobian matrix block approximation>

The only challenge in implementing the ColumnwiseBlockMatrix is ensuring type stability, particularly when it gets used in a BlockMatrixSystemSolver (see the next section); without type stability, we will have long compilation times and unnecessary allocations. Fortunately, this has already been worked out and tested in the sketch. In particular, the @block_name macro will return a type-stable generalization of what we currently call a property_chain for both the row and column names, and the corresponding row and column fields will be accessed by using a type-stable generalization of Fields.single_field.

BlockMatrixSystemSolver

We are currently solving the linear system of equations specified by the SchurComplementW object by reducing it to a smaller tridiagonal system of equations, solving the reduced problem, and using the reduced problem's solution to compute the original problem's solution. However, this strategy will not work when we add the new implicit EDMF tendencies because the new sparsity pattern of the matrix will not allow us to specify the reduced problem without performing a computationally expensive dense matrix inversion. So, we will need to implement a new algorithm for solving sparse block matrix systems. Per @simonbyrne's advice, this algorithm will work as follows:

To illustrate how this permutation will work, here is a simplified example that illustrates how we would permute the Jacobian of a FieldVector $Y$ with total implicit tendency $Y_t$ that is defined on a single column with two cells, with a field $c$ defined on cell centers and a field $f$ defined on cell faces:

\text{Original } \dfrac{\partial Y_t}{\partial Y} =
\begin{bmatrix}
\dfrac{\partial Y_t.c}{\partial Y.c} & \dfrac{\partial Y_t.c}{\partial Y.f} \\
\dfrac{\partial Y_t.f}{\partial Y.c} & \dfrac{\partial Y_t.f}{\partial Y.f}
\end{bmatrix} =
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.c[2]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]}
\end{bmatrix}
\end{bmatrix}
\text{Permuted } \dfrac{\partial Y_t}{\partial Y} =
\begin{bmatrix}
\dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 2})} \\
\dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 2})}
\end{bmatrix} =
\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad
\qquad =
\begin{cases}
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} \\
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]}
\end{bmatrix}
\end{bmatrix}  &
\text{ if }  \dfrac{\partial Y_t.\square}{\partial Y.f[2.5]} = \dfrac{\partial Y_t.f[2.5]}{\partial Y.\square} = C\delta_{\square, f[2.5]} \\[5 ex]
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]}
\end{bmatrix}
\end{bmatrix} &
\text{ if }  \dfrac{\partial Y_t.\square}{\partial Y.f[0.5]} = \dfrac{\partial Y_t.f[0.5]}{\partial Y.\square} = C\delta_{\square, f[0.5]}
\end{cases}

As this example illustrates, the permutation requires us to drop all of the matrix elements that correspond to the top or bottom cell face from the linear solve, which we can do as long as the only nonzero matrix element for that cell face is the "identity element". In ClimaAtmos, we know that this will always be the case for the top cell face; in the example above, the "identity element" for this cell face is $\partial Y_t.f[2.5]/\partial Y.f[2.5]$. In our code, the value of $\partial Y_t.f[2.5]/\partial Y.f[2.5]$ (or, rather, the value of -1 + Δtγ * ∂(Yₜ.f.w.components.data.:1[2.5])/∂(Y.f.w.components.data.:1[2.5]), and the corresponding EDMF updraft velocity terms) will typically be $-1$, so, using the somewhat hand-wavy notation from above, we will typically have that $\partial Y_t.\square/\partial Y.f[2.5] = \partial Yt.f[2.5]/\partial Y.\square = -\delta{\square, f[2.5]}$. In ClimaLSM, there are currently no prognostic variables defined on cell faces, but, if there were, this would be the case for the bottom cell face, rather than the top cell face. In general, as long as we deal with the nonzero identity element separately, we can drop all of the matrix elements related to one cell face from the linear solve. There are also two other types of variables whose corresponding matrix elements we will be able to drop from the linear solve (because the only nonzero elements will be the identity elements): variables that do not have a nonzero implicit tendency, like Y.c.uₕ in ClimaAtmos, and variables that do not lie on cell centers or on cell faces, like those related to the river model in ClimaLSM.

After the matrix is permuted, the new linear system will be solved using a band matrix solver. Both the dycore and the land model will require a tridiagonal solver, and EDMF may also require a pentadiagonal solver. If $N$ denotes the number of cells and $V$ denotes the number of variables, then the band matrix solver will be applied to a matrix of $N \times N$ blocks, where each block is itself a $V \times V$ matrix. To simplify our initial implementation, we can treat each block as a dense matrix, which we will represent using a StaticArrays.SMatrix. Although this means that we will need to perform dense matrix inversions, these shouldn't be too expensive as long as $V$ is relatively small. If we were to instead modify the current "Schur complement solve" algorithm for EDMF, we would end up needing to perform a dense matrix inversion on a large block matrix, where each block is one of the original $N \times N$ blocks that represent pairs of variables. Since it will generally be the case that $N > V$, this means that the new "permuted band matrix solve" algorithm will be significantly more performant for EDMF than the current algorithm.

Unfortunately, it will not be possible to eliminate dense matrix inversions from the new algorithm altogether by specializing on the sparsity structure of the $V \times V$ blocks. Even though each $V \times V$ block in ClimaAtmos will be an arrowhead matrix (both for the dycore and for EDMF), the band matrix solver will need to evaluate linear combinations of products of the blocks and their inverses, which will not have any particularly nice sparsity structure. Specifically, the inverse of an arrowhead matrix is a diagonal-plus-rank-one (DPR1) matrix, and all of the following matrix-matrix products are neither arrowhead nor DPR1 matrices: arrowhead times arrowhead, DPR1 times DPR1, and arrowhead times DPR1 (unless they are inverses of each other). This means that the new algorithm is likely to be slower than the current one for the dycore, since the current one does not involve any dense matrix inversions. So, in addition to implementing the new algorithm for BlockMatrixSystemSolver, we will also need to port over the algorithm currently implemented in ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl. This will require updating the current algorithm to use the new interface for BandMatrixRow and MultiplyColumnwiseBandMatrixField, and it will also require allowing the BlockMatrixSystemSolver to determine which of the two algorithms it should use based on the type of the ColumnwiseBlockMatrix given to it.

Task breakdown

Reviewers

kmdeck commented 1 year ago

If we imagine a system with n prognostic variables, and denote the tendency due to the boundary term as t_{bc}, t_{bc} is a vector of length n. Then, the derivative is an object like ∂t_{bc,j}∂Y_i, where i and j range from 1 to n, and Y_i is (for us) the prognostic state defined on cell centers at the top (or bottom) layer.

Two questions

charleskawczynski commented 1 year ago

Can we:

tapios commented 1 year ago

This describes the current issues well and clearly. The description of the solution could use detail (though the draft PR helps).

For us to be able to track progress, could you please add milestones expected completion dates (roughly weekly milestones)?

It'll also be important for @sriharshakandala, @charleskawczynski, and @simonbyrne to review, especially with regard to GPU-friendliness of the proposed solutions.

Wherever this refactoring lands, it'll be important to keep the current Schur complement formulation around (e.g., for convection resolving simulations). It may be nice to allow this as an option that lives within this refactoring, rather than outside of it.

charleskawczynski commented 1 year ago

Also, please do not leave the simple array-based implementation test as an after thought-- this will be helpful to evaluate the design before it's fully implemented, and it will be too late to help reviewers judge the quality of the design if it's the very last thing that is added. cc @dennisYatunin

dennisYatunin commented 1 year ago

@kmdeck @juliasloan25 After discussing the matter with @simonbyrne, it looks like we will be able to set the derivatives of operator boundary conditions without adding any new operators (and without any assumptions like ∂t_{bc,j}∂Y_i \propto \delta_{i,j}). However, we will no longer be able to attach boundary conditions to face-to-center operators like DivergenceF2C, and we will instead need to ensure that the arguments of these operators either have well-defined boundary values or are wrapped in a SetBoundaryOperator. This is because, if we allow operators to overwrite the face values of their inputs, then we also need to allow the corresponding operator matrices to overwrite the values of matrices/vectors that they get multiplied by, but there is no good way to distinguish a matrix that overwrites values during multiplication from one that does not overwrite values (since every value stored in a Field must have the same type, we cannot simply attach additional type information to indicate how values should be handled during multiplication).

Here are some concrete examples:

dennisYatunin commented 1 year ago

@tapios @charleskawczynski I have made the following updates to the SDI based on your comments:

After a lengthy discussion with Simon and Sriharsha today, it has become clear that the "permuted band matrix solve" algorithm will require a fair bit of work to ensure that it is performant and GPU-compatible. However, all of the tasks that come before the implementation of this new algorithm (up to testing the interface in ClimaAtmos) should be good as-is. Per the revised time estimates, these tasks should take 3--4 weeks, which should be enough time for us to settle on a decent implementation of the new algorithm. If I start later this week, this means that the interface will be tested in ClimaAtmos by around this time in June.

dennisYatunin commented 1 year ago

@simonbyrne @sriharshakandala Here is a list of all the matrix sparsity patterns that BlockMatrixSystemSolver will need to support in the near future. For each case, I will specify the sparsity pattern of $J = \partial Y_t/\partial Y$, where $Y_t$ is the implicit tendency of $Y$, though the actual matrix required by the implicit solver will be $W = -I + \Delta t \gamma J$, where $I$ is the identity matrix and $\Delta t \gamma$ is a scalar. I will also use the following shorthands:

For ClimaLSM, we only need to support the sparsity pattern of the matrix specified in the comment above,

J = \begin{pmatrix} \dfrac{\partial Y_t.c.\theta}{\partial Y.c.\theta} \end{pmatrix} = \begin{pmatrix} \mathbb{T} \end{pmatrix}

In the future, ClimaLSM will also include the prognostic variable $Y.c.\psi$, but this is still several months away.

For the ClimaAtmos dycore, the sparsity pattern of $J$ with $T$ tracers (and no FCT, just upwinding or central differencing) is

J = \begin{pmatrix}
\dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.f.u_3\_\text{data}} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.f.u_3\_\text{data}}
\end{pmatrix} =
= \begin{pmatrix}
\mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{T}/\mathbb{P} & \mathbb{T}/\mathbb{P} & 0/\mathbb{T}/\mathbb{P} & \cdots & 0/\mathbb{T}/\mathbb{P} & \mathbb{Q} \\
\mathbb{T}/\mathbb{P} & 0 & \mathbb{T}/\mathbb{P} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & \mathbb{T}/\mathbb{P} & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0/\mathbb{B} & \cdots & 0/\mathbb{B} & \mathbb{T}
\end{pmatrix}

However, instead of using the exact value of $J$, we approximate it so that its sparsity pattern simplifies to

J \approx \begin{pmatrix}
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix}

For AMIP, we only need to support $T = 1$, with $\chi = q{tot}$. After AMIP, though, we will need to support $T \gg 1$, with $q{liq}$, $q{ice}$, $q{rai}$, $q_{sno}$, and every aerosol represented by a prognostic variable. So, BlockMatrixSystemSolver needs to perform well for the dycore with large values of $T$. At the moment, the best algorithm we have for this is the "Schur complement solve".

When we add EDMF to ClimaAtmos, we end up with a significantly more complicated Jacobian. We are not entirely certain of how sparse we can afford to make our approximation of $J$, but our best guess at the moment (with only one draft in in the EDMF model, which corresponds to only one sub-grid-scale copy of the grid-scale state) is

\begin{gather}
J = \begin{pmatrix}
\dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\
\dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}
\end{pmatrix},\ \text{where}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} \approx 0, \\[1 em]
\dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} \approx \begin{pmatrix}
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & \mathbb{T} & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & \mathbb{T} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & \mathbb{T} & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix},\ \text{and}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}} \approx \begin{pmatrix}
\mathbb{D} & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{D} & \mathbb{D} & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{D} & 0 & \mathbb{D} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\mathbb{D} & 0 & 0 & \cdots & \mathbb{D} & \mathbb{B} \\
\mathbb{B} & 0 & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix}
\end{gather}

Since we plan to approximate the bottom-left matrix block as 0, our strategy for solving the linear problem $W\ \Delta Y = b$ will be to first solve a linear problem that corresponds to the bottom-right block, and to then solve another linear problem that corresponds to the top-left block. That is to say,

\begin{gather}
W\ \Delta Y = b \implies \\[1 em]
\begin{pmatrix}
-I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\
0 & -I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}
\end{pmatrix} \begin{pmatrix} \Delta\text{ GS State} \\ \Delta\text{ SGS State} \end{pmatrix} = \begin{pmatrix} \text{GS RHS} \\ \text{SGS RHS} \end{pmatrix} \implies \\[1 em]
\left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}\right)\ \Delta\text{ SGS State} = \text{SGS RHS}\ \text{and} \\
\left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}}\right)\ \Delta\text{ GS State} = \text{GS RHS} - \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}}\ \Delta\text{ SGS State}
\end{gather}

This means that our implicit solver will first compute changes at the sub-grid-scale ($\Delta\text{ SGS State}$), and then it will compute changes at the grid-scale ($\Delta\text{ GS State}$). We do not yet know how we will approximate the top-right matrix block of the Jacobian, so we do not know what its sparsity structure will be. Fortunately, this doesn't actually matter, since that block will only be used for computing a matrix-vector product, rather than for solving a linear problem.

In the near future, we only plan to use EDMF with $T = 1$ or $T \approx 1$, since there is still a fair bit of work that needs to be done before we are ready to test it with large numbers of tracers. So, we can afford to have BlockMatrixSystemSolver scale poorly with $T$ when we use EDMF, which justifies using the "permuted matrix solve" algorithm proposed in this SDI.

tapios commented 1 year ago

I appreciate your thoroughness, @dennisYatunin!

One important simplification, though: In the dycore, we can treat all tracers fully explicitly, so T=0 is fine, or, if you want to include moisture, T=1 (but even that is not strictly necessary). We only need to treat the fast waves (acoustic and gravity) implicitly, and tracers play no role in them.

dennisYatunin commented 1 year ago

@tapios Thanks! We actually stopped making that assumption in ClimaAtmos sometime last year. I think the intent was to ensure that all forms of moisture get treated roughly the same way as energy in the implicit solve, but neither Daniel nor Zhaoyi nor I remember how significant the resulting timestep increase was. Should we revert to making that approximation, or add a flag to toggle between the two options? In either case, the "Schur complement solve" scales very well with $T$, so whether or not the vertical advection of tracers is treated implicitly should not have a significant impact on performance.

tapios commented 1 year ago

@dennisYatunin It's ok to treat the moisture tracers (q_t, q_l, q_i) implicitly and in the same way as energy. But no need to do the same for other tracers.

sriharshakandala commented 1 year ago

@dennisYatunin : Is there a reason we are deviating from the more traditional band storage format (https://www.ibm.com/docs/en/essl/6.2?topic=representation-blas-general-band-storage-mode) for storing band matrices, which essentially stores the bands? We can use an offset array and eliminate storing the "unused data" for off-diagonal bands!

dennisYatunin commented 1 year ago

@sriharshakandala We are indeed storing the matrices in a traditional band storage format, except we are using "row-major" storage instead of "column-major" storage (the link you sent uses "column-major" storage). We are storing the matrices row-by-row in order to encode them as Fields, and we only store the nonzero entries in each row. The top and bottom rows are exceptions to this; since they have fewer nonzero entries than the interior rows, we need to pad them with several "dummy entries" that do not lie in the matrix (these are equivalent to the *s in the link you sent). Also, we want our matrices to be Fields so that we can include them in Field broadcasts, which makes it much easier to specify Jacobian blocks.

sriharshakandala commented 1 year ago

Thanks, @dennisYatunin. For the quaddiagonal and pentadiagonal matrices, I am assuming we are either losing them in the Jacobian matrix approximation or the bands are located contiguously!

dennisYatunin commented 1 year ago

@tapios Time estimates have been updated!

tapios commented 1 year ago

Thanks, @dennisYatunin. Looking good!