Implicit Solver Interface Updates

Purpose

The interface we use to specify matrices and solve linear systems of equations for the implicit solver needs to be refactored and extended. This interface, which is currently spread across several files in ClimaCore and ClimaAtmos (ClimaCore.jl/src/Operators/stencilcoefs.jl, ClimaCore.jl/src/Operators/pointwisestencil.jl, ClimaCore.jl/src/Operators/operator2stencil.jl, ClimaAtmos.jl/src/tendencies/implicit/wfact.jl, and ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl) is unnecessarily convoluted and involves a large number of hardcoded assumptions, which makes it extremely challenging for new users to experiment with the implicit solver and to add new implicit tendencies. In particular, the next stage of EDMF development will involve adding many implicit tendencies to the atmosphere model, and our approximation of the total implicit tendency's Jacobian matrix will end up with a highly non-trivial sparsity pattern. In order for more than a single developer to be able to understand and implement the implicit solver for the dycore+EDMF, we will first need to make the changes outlined in this SDI. In addition, the land modeling team has been experimenting with their own implicit solver, and these changes will speed up their development and add the new functionality they require.

There is currently a draft PR that contains a detailed sketch of all the proposed changes: #1190

Cost/benefits/risks

The cost/risk is development time. The benefit will be a significantly reduced complexity of implicit solver implementations, both in ClimaAtmos and in ClimaLSM. There will be a simple, well-documented interface to all of the numerical algorithms required by implicit solvers, which will allow user-facing code to be relatively short and easily extensible.

Producers

@dennisYatunin

Components

Simplify the interface for band matrix fields and matrix multiplication using fields.
Add support for specifying derivatives of two-argument operators and for specifying derivatives of operator boundary conditions.
Add a user-friendly representation of block matrices.
Add an efficient linear solver for block matrices.
Add thorough documentation and unit tests.

Inputs

`BandMatrixRow` and `MultiplyColumnwiseBandMatrixField`

The type we currently use to represent an element of a band matrix field is called StencilCoefs, and it is extremely confusing and poorly designed. Upon refactoring, this will become BandMatrixRow, which will have the following improvements:

Simple constructors, as well as aliases for common matrix types; e.g., DiagonalMatrixRow(1) and TridiagonalMatrixRow(1, 2, 3)
A well-defined set of allowed operations—accessing individual row entries, taking linear combinations of rows, and checking for equality of rows
Automatic conversion/promotion for subtypes of BandMatrixRow and LinearAlgebra.UniformScaling, so that different types of matrices can be mixed together; e.g., LinearAlgebra.I / 2 - TridiagonalMatrixRow(1, 2, 3) + 2 * PentadiagonalMatrixRow(1, 2, 3, 4, 5) == PentadiagonalMatrixRow(2, 3, 4.5, 5, 10)

These improvements will also be reflected in fields of BandMatrixRows, which will be aliased as ColumnwiseBandMatrixFields for dispatch. (The alias name is meant to indicate that every column has its own set of BandMatrixRows, which, when taken together, can be interpreted as a band matrix.) So, for example, users will be able to write (@. LinearAlgebra.I / 2 - tridiagonal_matrix_field + 2 * pentadiagonal_matrix_field) == (@. PentadiagonalMatrixRow(field1, field2, field3, field4, field5)).

The operators we currently use for matrix-matrix and matrix-vector multiplication are Operators.ComposeStencils() and Operators.ApplyStencil(), respectively. Again, these are confusing and implemented in a rather roundabout way. Upon refactoring, these will both become ⋅, which is an alias for Operators.MultiplyColumnwiseBandMatrixField(). This will allow users to write something like @. matrix_field1 ⋅ matrix_field2 ⋅ field, instead of needing to write @. apply(compose(matrix_field1, matrix_field2), field). In addition, the amount of code used to implement matrix multiplication can be reduced roughly by a factor of 3 (as shown in the sketch), and this simplified code will be easier to update for GPUs in the near future.

`FiniteDifferenceOperatorTermsMatrix`

When a finite difference operator is applied to a field (@. op(field)), the result is equivalent to multiplying some matrix by that field (@. matrix_field ⋅ field). The operator we currently use to generate this matrix is Operators.Operator2Stencil(op); in order to clarify what this operator is doing, it will be renamed to Operators.FiniteDifferenceOperatorTermsMatrix(op). If op_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(op) and ones_field = ones(axes(field)), users will be able to confirm that (@. op(field)) == (@. op_matrix(ones_field) ⋅ field). As a quirk of our implementation, it is also the case that (@. op_matrix(ones_field) ⋅ field) == (@. op_matrix(field) ⋅ ones_field), which allows us to somewhat simplify expressions involving products with operator matrices.

Aside from the name change, there are two new features that we need to add to FiniteDifferenceOperatorTermsMatrix. First, for EDMF development, we need to add support for multi-argument operators, so that FiniteDifferenceOperatorTermsMatrix(op) will always generate the matrix that corresponds to the last argument of op. For example, given a two-argument operator op (such as WeightedInterpolateF2C or Upwind3rdOrderBiasedProductC2F), users will be able to define op_matrix and confirm that (@. op(field1, field2)) == (@. op_matrix(field1, ones_field2) ⋅ field2) == (@. op_matrix(field1, field2) ⋅ ones_field2).

Second, for land model development, we need to add support for specifying the corner elements of matrices by adding special boundary conditions for FiniteDifferenceOperatorTermsMatrix. The simple expression presented earlier, (@. op(field)) == (@. op_matrix(ones_field) ⋅ field), is only true when op has trivial boundary conditions; i.e., when op is a center-to-face operator with boundary conditions that cause it to return 0 on the top and bottom faces, or when op is a face-to-center operator without any boundary conditions (which means that it uses the values at the top and bottom faces as-is). In general, operators are affine transformations at the boundaries, not linear transformations. This means that, for every op and field, there is some boundary_field that is zero everywhere except at the boundaries, and (@. op(field)) == (@. op_matrix(ones_field) ⋅ field + boundary_field). However, we only use matrices to represent derivatives of operators with respect to their inputs, and, up until now, it has always been the case that boundary_field is a constant that does not depend on field. More specifically, if boundary_matrix_field represents ∂(boundary_field)/∂(field), then ∂(@. op(field))/∂(field) can be expressed as @. op_matrix(ones_field) + boundary_matrix_field. So, if boundary_field is a constant, then boundary_matrix_field is the zero matrix and can be ignored in our computations. This amounts to assuming that the corner elements of our matrices are always zero. Due to new requirements from the land model, we will no longer be able to make this assumption, so we will need to add support for the Operators.SetValue boundary condition for FiniteDifferenceOperatorTermsMatrix, which will allow users to specify nonzero elements from boundary_matrix_field that should be added to @. op_matrix(ones_field).

Additionally, it would be good to refactor how finite difference operators are implemented so that the code for every operator does not need to be duplicated in order to implement the FiniteDifferenceOperatorTermsMatrix for that operator. Currently, every operator implements a method for stencil_interior, stencil_left_boundary, and stencil_right_boundary, and each of these methods returns some value (the value of the operator's result at a particular point in space). The FiniteDifferenceOperatorTermsMatrix for every operator also implements the same three methods, but each of its methods returns a BandMatrixRow (or, rather, a "StencilCoefs") whose entries add up to the value returned by the operator's corresponding method. There are currently almost 700 lines of code required to implement these duplicated methods, and more will need to be added in order to support multi-argument operators. It should be fairly straightforward to refactor things so that:

The methods of stencil_interior, stencil_left_boundary, and stencil_right_boundary that get implemented for every operator return a BandMatrixRow.
If the operator is wrapped in a FiniteDifferenceOperatorTermsMatrix, this BandMatrixRow gets returned as-is. If matrix boundary conditions are specified, the first or last value of this BandMatrixRow may need to be modified.
Otherwise, if it is not wrapped in a FiniteDifferenceOperatorTermsMatrix, the entries of this BandMatrixRow get added together before being returned.

However, this last change would only improve things "under the hood", without any immediate benefit to users. In order to avoid unnecessarily delaying EDMF and land model development, this change will be the last step of this SDI.

`ColumnwiseBlockMatrix`

The type we currently use to represent block matrices is the SchurComplementW object, which is hardcoded to only work for the dycore in ClimaAtmos. This can be generalized to a ColumnwiseBlockMatrix, which will be a simple dictionary that maps pairs of field names (one name for the row and another name for the column) to their corresponding matrix blocks, each of which can be a ColumnwiseBandMatrixField or a LinearAlgebra.UniformScaling. The ColumnwiseBlockMatrix will be used as follows:

matrix = ColumnwiseBlockMatrix(
    @block_name(c.ρ, c.ρ) => -LinearAlgebra.I,
    @block_name(c.ρ, f.w.components.data.:1) =>
        similar(Y.c, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, c.ρ) =>
        similar(Y.f, BidiagonalMatrixRow{FT}),
    @block_name(f.w.components.data.:1, f.w.components.data.:1) =>
        similar(Y.f, TridiagonalMatrixRow{FT}),
)
matrix[@block_name(f.w.components.data.:1, f.w.components.data.:1)] .=
    -LinearAlgebra.I .+ Δtγ .* <Jacobian matrix block approximation>

The only challenge in implementing the ColumnwiseBlockMatrix is ensuring type stability, particularly when it gets used in a BlockMatrixSystemSolver (see the next section); without type stability, we will have long compilation times and unnecessary allocations. Fortunately, this has already been worked out and tested in the sketch. In particular, the @block_name macro will return a type-stable generalization of what we currently call a property_chain for both the row and column names, and the corresponding row and column fields will be accessed by using a type-stable generalization of Fields.single_field.

`BlockMatrixSystemSolver`

We are currently solving the linear system of equations specified by the SchurComplementW object by reducing it to a smaller tridiagonal system of equations, solving the reduced problem, and using the reduced problem's solution to compute the original problem's solution. However, this strategy will not work when we add the new implicit EDMF tendencies because the new sparsity pattern of the matrix will not allow us to specify the reduced problem without performing a computationally expensive dense matrix inversion. So, we will need to implement a new algorithm for solving sparse block matrix systems. Per @simonbyrne's advice, this algorithm will work as follows:

Permute the rows and columns of the ColumnwiseBlockMatrix so that, instead of blocks that correspond to pairs of variables, we end up with blocks that correspond to pairs of cells.
Solve the permuted block matrix system. As long as there are fewer variables than cells, inverting the blocks of the permuted matrix will be less computationally expensive than inverting the blocks of the original matrix.
Permute the elements of the solution back to their required form.

To illustrate how this permutation will work, here is a simplified example that illustrates how we would permute the Jacobian of a FieldVector $Y$ with total implicit tendency $Y_t$ that is defined on a single column with two cells, with a field $c$ defined on cell centers and a field $f$ defined on cell faces:

\text{Original } \dfrac{\partial Y_t}{\partial Y} =
\begin{bmatrix}
\dfrac{\partial Y_t.c}{\partial Y.c} & \dfrac{\partial Y_t.c}{\partial Y.f} \\
\dfrac{\partial Y_t.f}{\partial Y.c} & \dfrac{\partial Y_t.f}{\partial Y.f}
\end{bmatrix} =
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.c[2]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.f[0.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]}
\end{bmatrix}
\end{bmatrix}

\text{Permuted } \dfrac{\partial Y_t}{\partial Y} =
\begin{bmatrix}
\dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 1})_t}{\partial(\text{cell 2})} \\
\dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 1})} & \dfrac{\partial(\text{cell 2})_t}{\partial(\text{cell 2})}
\end{bmatrix} =
\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad

\qquad =
\begin{cases}
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[0.5]} \\
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[0.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[0.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[0.5]}{\partial Y.f[1.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[0.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[0.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]}
\end{bmatrix}
\end{bmatrix}  &
\text{ if }  \dfrac{\partial Y_t.\square}{\partial Y.f[2.5]} = \dfrac{\partial Y_t.f[2.5]}{\partial Y.\square} = C\delta_{\square, f[2.5]} \\[5 ex]
\begin{bmatrix}
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[1.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[1]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[1]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[1.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[1.5]}{\partial Y.f[2.5]}
\end{bmatrix} \\[2 ex]
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[1]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[1.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[1]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[1.5]}
\end{bmatrix} &
\begin{bmatrix}
\dfrac{\partial Y_t.c[2]}{\partial Y.c[2]} & \dfrac{\partial Y_t.c[2]}{\partial Y.f[2.5]} \\
\dfrac{\partial Y_t.f[2.5]}{\partial Y.c[2]} & \dfrac{\partial Y_t.f[2.5]}{\partial Y.f[2.5]}
\end{bmatrix}
\end{bmatrix} &
\text{ if }  \dfrac{\partial Y_t.\square}{\partial Y.f[0.5]} = \dfrac{\partial Y_t.f[0.5]}{\partial Y.\square} = C\delta_{\square, f[0.5]}
\end{cases}

As this example illustrates, the permutation requires us to drop all of the matrix elements that correspond to the top or bottom cell face from the linear solve, which we can do as long as the only nonzero matrix element for that cell face is the "identity element". In ClimaAtmos, we know that this will always be the case for the top cell face; in the example above, the "identity element" for this cell face is $\partial Y_t.f[2.5]/\partial Y.f[2.5]$. In our code, the value of $\partial Y_t.f[2.5]/\partial Y.f[2.5]$ (or, rather, the value of -1 + Δtγ * ∂(Yₜ.f.w.components.data.:1[2.5])/∂(Y.f.w.components.data.:1[2.5]), and the corresponding EDMF updraft velocity terms) will typically be $-1$, so, using the somewhat hand-wavy notation from above, we will typically have that $\partial Y_t.\square/\partial Y.f[2.5] = \partial Yt.f[2.5]/\partial Y.\square = -\delta{\square, f[2.5]}$. In ClimaLSM, there are currently no prognostic variables defined on cell faces, but, if there were, this would be the case for the bottom cell face, rather than the top cell face. In general, as long as we deal with the nonzero identity element separately, we can drop all of the matrix elements related to one cell face from the linear solve. There are also two other types of variables whose corresponding matrix elements we will be able to drop from the linear solve (because the only nonzero elements will be the identity elements): variables that do not have a nonzero implicit tendency, like Y.c.uₕ in ClimaAtmos, and variables that do not lie on cell centers or on cell faces, like those related to the river model in ClimaLSM.

After the matrix is permuted, the new linear system will be solved using a band matrix solver. Both the dycore and the land model will require a tridiagonal solver, and EDMF may also require a pentadiagonal solver. If $N$ denotes the number of cells and $V$ denotes the number of variables, then the band matrix solver will be applied to a matrix of $N \times N$ blocks, where each block is itself a $V \times V$ matrix. To simplify our initial implementation, we can treat each block as a dense matrix, which we will represent using a StaticArrays.SMatrix. Although this means that we will need to perform dense matrix inversions, these shouldn't be too expensive as long as $V$ is relatively small. If we were to instead modify the current "Schur complement solve" algorithm for EDMF, we would end up needing to perform a dense matrix inversion on a large block matrix, where each block is one of the original $N \times N$ blocks that represent pairs of variables. Since it will generally be the case that $N > V$, this means that the new "permuted band matrix solve" algorithm will be significantly more performant for EDMF than the current algorithm.

Unfortunately, it will not be possible to eliminate dense matrix inversions from the new algorithm altogether by specializing on the sparsity structure of the $V \times V$ blocks. Even though each $V \times V$ block in ClimaAtmos will be an arrowhead matrix (both for the dycore and for EDMF), the band matrix solver will need to evaluate linear combinations of products of the blocks and their inverses, which will not have any particularly nice sparsity structure. Specifically, the inverse of an arrowhead matrix is a diagonal-plus-rank-one (DPR1) matrix, and all of the following matrix-matrix products are neither arrowhead nor DPR1 matrices: arrowhead times arrowhead, DPR1 times DPR1, and arrowhead times DPR1 (unless they are inverses of each other). This means that the new algorithm is likely to be slower than the current one for the dycore, since the current one does not involve any dense matrix inversions. So, in addition to implementing the new algorithm for BlockMatrixSystemSolver, we will also need to port over the algorithm currently implemented in ClimaAtmos.jl/src/tendencies/implicit/schur_complement_W.jl. This will require updating the current algorithm to use the new interface for BandMatrixRow and MultiplyColumnwiseBandMatrixField, and it will also require allowing the BlockMatrixSystemSolver to determine which of the two algorithms it should use based on the type of the ColumnwiseBlockMatrix given to it.

Task breakdown

[x] #1350 PR: #1326 Estimated Completion Date: July 10th
- Refactor BandMatrixRow and MultiplyColumnwiseBandMatrixField. Ensure that these objects have good docstrings.
- Set up a thorough suite of unit tests for BandMatrixRow and MultiplyColumnwiseBandMatrixField in CI. These tests should ensure correctness, GPU compatibility, and type stability (i.e., no unexpected allocations). The tests should also be helpful for performance analysis.
[x] Rename and extend FiniteDifferenceOperatorTermsMatrix. Add a helpful docstring, and add more unit tests to the test suite. PR: #1399 Estimated Completion Date: July 31st
[x] Implement ColumnwiseBlockMatrix and BlockMatrixSystemSolver, but only with the current "Schur complement solve" algorithm from ClimaAtmos. Add docstrings and unit tests. PR: #1436 Estimated Completion Date: July 25th
[x] Update ClimaAtmos to use the new interface. Verify that the results are unchanged and that the performance is not worsened. PR: CliMA/ClimaAtmos.jl#2044 Estimated Completion Date: August 4th
[x] ~~Add the new "permuted band matrix solve" algorithm to BlockMatrixSystemSolver. Expand the docstring and add unit tests.~~ ~~Skipping this in favor of JFNK with clever preconditioning.~~ Skipping this in favor of the SchurComplementReductionSolve algorithm in conjunction with a BlockArrowheadSchurComplementPreconditioner. PR: #1551
[ ] Modify the ClimaAtmos implicit solver to handle implicit EDMF tendencies. Confirm that we get the expected results. Estimated Completion Date: TBD
[ ] When there is some available time, refactor finite difference operators and their corresponding matrices to remove unnecessarily duplicated code. Estimated Completion Date: TBD

Reviewers

@simonbyrne
@charleskawczynski
@tapios
@trontrytel
@szy21
@kmdeck
@Zhengyu-Huang

If we imagine a system with n prognostic variables, and denote the tendency due to the boundary term as t_{bc}, t_{bc} is a vector of length n. Then, the derivative is an object like ∂t_{bc,j}∂Y_i, where i and j range from 1 to n, and Y_i is (for us) the prognostic state defined on cell centers at the top (or bottom) layer.

Two questions

will we be supplying ∂t_{bc,j}∂Y_i or something different (e.g. derivatives of boundary conditions with respect to state)? Based on what is written, we set values of ∂(boundary_field)/∂(field), and I think boundary_field is the tendency contribution, but want to double check.
will there be any assumption like ∂t_{bc,j}∂Y_i \propto \delta_{i,j}? Or could we set nonzero values of the tendency of one variable with respect to another?

Can we:

Write tests for this sketch that compare against a very simple Fortran-style array implementation, for a few core examples?
Write additional unit tests that you believe are important to ensure the correctness, including edge and corner cases (and so that others can exercise this code for, e.g., performance analysis)

This describes the current issues well and clearly. The description of the solution could use detail (though the draft PR helps).

For us to be able to track progress, could you please add milestones expected completion dates (roughly weekly milestones)?

It'll also be important for @sriharshakandala, @charleskawczynski, and @simonbyrne to review, especially with regard to GPU-friendliness of the proposed solutions.

Wherever this refactoring lands, it'll be important to keep the current Schur complement formulation around (e.g., for convection resolving simulations). It may be nice to allow this as an option that lives within this refactoring, rather than outside of it.

Also, please do not leave the simple array-based implementation test as an after thought-- this will be helpful to evaluate the design before it's fully implemented, and it will be too late to help reviewers judge the quality of the design if it's the very last thing that is added. cc @dennisYatunin

@kmdeck @juliasloan25 After discussing the matter with @simonbyrne, it looks like we will be able to set the derivatives of operator boundary conditions without adding any new operators (and without any assumptions like ∂t_{bc,j}∂Y_i \propto \delta_{i,j}). However, we will no longer be able to attach boundary conditions to face-to-center operators like DivergenceF2C, and we will instead need to ensure that the arguments of these operators either have well-defined boundary values or are wrapped in a SetBoundaryOperator. This is because, if we allow operators to overwrite the face values of their inputs, then we also need to allow the corresponding operator matrices to overwrite the values of matrices/vectors that they get multiplied by, but there is no good way to distinguish a matrix that overwrites values during multiplication from one that does not overwrite values (since every value stored in a Field must have the same type, we cannot simply attach additional type information to indicate how values should be handled during multiplication).

Here are some concrete examples:

First, a relatively simple example. Suppose that flux³ is a field of scalars on the faces of three cells, and that top_flux³ is another scalar. Suppose that we define

div = Operators.DivergenceF2C()
set_boundary_flux = Operators.SetBoundaryOperator(; top = Operators.SetValue(CT3(top_flux³)))
flux = set_boundary_flux.(CT3.(flux³))

CT3 is short for Contravariant3Vector, so flux is a field that represents a vector-valued flux through each cell face, where the flux through the top cell face is overwritten from CT3(flux³[3.5]) to CT3(top_flux³). If J[n] denotes the Jacobian of the metric tensor at index n, we can express div.(flux) as

\begin{gather}
\text{div}.(\text{flux}) =
\text{div}.\begin{pmatrix}
\text{CT3}(\text{flux³}[0.5]) \\
\text{CT3}(\text{flux³}[1.5]) \\
\text{CT3}(\text{flux³}[2.5]) \\
\text{CT3}(\text{top\_flux³})
\end{pmatrix} =
\begin{pmatrix}
\dfrac{1}{J[1]}\ (J[1.5]\ \text{flux³}[1.5] - J[0.5]\ \text{flux³}[0.5]) \\
\dfrac{1}{J[2]}\ (J[2.5]\ \text{flux³}[2.5] - J[1.5]\ \text{flux³}[1.5]) \\
\dfrac{1}{J[3]}\ (J[3.5]\ \text{top\_flux³} - J[2.5]\ \text{flux³}[2.5])
\end{pmatrix}
\end{gather}

Now, suppose that

\frac{\partial\text{top\_flux³}}{\partial\text{flux³}[n]} =
\begin{cases}
\text{top\_flux³\_deriv} & n = 3.5 \\
0 & n < 3.5
\end{cases}

This means that the derivative of div.(flux) with respect to flux³ is the matrix

\begin{align}
\dfrac{\partial\text{div}.(\text{flux})}{\partial\text{flux³}} &= \begin{pmatrix}
\dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[0.5]} &
\dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[1.5]} &
\dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[2.5]} &
\dfrac{\partial\text{div}.(\text{flux})[1]}{\partial\text{flux³}[3.5]} \\
\dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[0.5]} &
\dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[1.5]} &
\dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[2.5]} &
\dfrac{\partial\text{div}.(\text{flux})[2]}{\partial\text{flux³}[3.5]} \\
\dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[0.5]} &
\dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[1.5]} &
\dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[2.5]} &
\dfrac{\partial\text{div}.(\text{flux})[3]}{\partial\text{flux³}[3.5]}
\end{pmatrix} = \\[1em]
&= \begin{pmatrix}
-\dfrac{J[0.5]}{J[1]} & \dfrac{J[1.5]}{J[1]} & 0 & 0 \\
0 & -\dfrac{J[1.5]}{J[2]} & \dfrac{J[2.5]}{J[2]} & 0 \\
0 & 0 & -\dfrac{J[2.5]}{J[3]} & \dfrac{J[3.5]}{J[3]}\ \text{top\_flux³\_deriv}
\end{pmatrix}
\end{align}

After this SDI is implemented, the preferred way to specify this matrix will be div_matrix.(∂flux∂flux³), where

div_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(div)
set_boundary_flux_deriv = Operators.SetBoundaryOperator(;
    top = Operators.SetValue(CT3(top_flux³_deriv)),
)
∂flux∂flux³ = set_boundary_flux_deriv.(CT3.(one.(flux³)))

The field ∂flux∂flux³ represents the derivative of each vector-valued flux with respect to the corresponding scalar in flux³:

\text{∂flux∂flux³} = \begin{pmatrix}
\dfrac{\partial\text{flux}[0.5]}{\partial\text{flux³}[0.5]} \\
\dfrac{\partial\text{flux}[1.5]}{\partial\text{flux³}[1.5]} \\
\dfrac{\partial\text{flux}[2.5]}{\partial\text{flux³}[2.5]} \\
\dfrac{\partial\text{flux}[3.5]}{\partial\text{flux³}[3.5]}
\end{pmatrix} = \begin{pmatrix}
\text{CT3}(1) \\
\text{CT3}(1) \\
\text{CT3}(1) \\
\text{CT3}(\text{top\_flux³\_deriv})
\end{pmatrix}

Now, an example that more closely matches your code (modulo some minus signs). Suppose that θ is a field of scalars on the centers of three cells, and that h, K, Fb, and Ft are all differentiable functions from scalars to scalars. In addition, suppose that

interp = Operators.InterpolateC2F()
grad = Operators.GradientC2F()
set_boundary_fluxes = Operators.SetBoundaryOperator(;
    bottom = Operators.SetValue(C3(∂x³∂ξ³[0.5] Fb(θ[1]))),
    top = Operators.SetValue(C3(∂x³∂ξ³[3.5] Ft(θ[3]))),
)
div  = Operators.DivergenceF2C()

The values of ∂x³∂ξ³ are used to convert the boundary fluxes Fb(θ[1]) and Ft(θ[3]) from physical units (m/s if the values of θ are dimensionless), to "covariant units" (m^2/s if the values of θ are dimensionless). This makes it possible to specify the boundary fluxes as Covariant3Vectors, which are normal to the boundary faces. We can also project the boundary fluxes onto the ξ³-axis to turn them into Contravariant3Vectors, which are the native inputs of the div operator:

\begin{align}
\text{CT3}(\text{C3}(∂x³∂ξ³[0.5]\ Fb[1])) &= \text{CT3}(\text{C3}(1))\ ∂x³∂ξ³[0.5]\ Fb[1] = \\
    &= \text{CT3}(g³³[0.5])\ ∂x³∂ξ³[0.5]\ Fb[1] = \\
    &= \text{CT3}(g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1]) \\
\text{CT3}(\text{C3}(∂x³∂ξ³[3.5]\ Ft[3])) &= \text{CT3}(\text{C3}(1))\ ∂x³∂ξ³[3.5]\ Ft[3] = \\
    &= \text{CT3}(g³³[3.5])\ ∂x³∂ξ³[3.5]\ Ft[3] = \\
    &= \text{CT3}(g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3])
\end{align}

To simplify our notation, we will be using the following shorthands:

\begin{align}
h[n] &= h(θ[n]) \\
K[n] &= K(θ[n]) \\
Fb[n] &= Fb(θ[n]) \\
Ft[n] &= Ft(θ[n]) \\
h'[n] &= h'(θ[n]) \\
K'[n] &= K'(θ[n]) \\
Fb'[n] &= Fb'(θ[n]) \\
Ft'[n] &= Ft'(θ[n]) \\
\end{align}

We can express the tendency of θ as

\begin{align}
θₜ &= \text{div}.(\text{set\_boundary\_fluxes}.(\text{interp}.(K.(θ))\  .*\  \text{grad}.(h.(θ)))) = \\
&= \text{div}.\left(\text{set\_boundary\_fluxes}.\left(
\text{interp}.\begin{pmatrix} K[1] \\ K[2] \\ K[3] \end{pmatrix}
\ .*\ 
\text{grad}.\begin{pmatrix} h[1] \\ h[2] \\ h[3] \end{pmatrix}
\right)\right) = \\[1em]
&= \text{div}.\left(\text{set\_boundary\_fluxes}.\left(
\begin{pmatrix}
\text{undefined} \\
\dfrac{1}{2}\ (K[1] + K[2]) \\
\dfrac{1}{2}\ (K[2] + K[3]) \\
\text{undefined}
\end{pmatrix}\  .*\ \begin{pmatrix}
\text{undefined} \\
\text{C3}(1)\ (h[2] - h[1]) \\
\text{C3}(1)\ (h[3] - h[2]) \\
\text{undefined}
\end{pmatrix}
\right)\right) = \\[1em]
&= \text{div}.\left(\text{set\_boundary\_fluxes}.\left(
\begin{pmatrix}
\text{undefined} \\
\dfrac{1}{2}\ (K[1] + K[2]) \\
\dfrac{1}{2}\ (K[2] + K[3]) \\
\text{undefined}
\end{pmatrix}\  .*\  \begin{pmatrix}
\text{undefined} \\
\text{CT3}(g³³[1.5])\ (h[2] - h[1]) \\
\text{CT3}(g³³[2.5])\ (h[3] - h[2]) \\
\text{undefined}
\end{pmatrix}
\right)\right) = \\[1em]
&= \text{div}.\left(\text{set\_boundary\_fluxes}.
\begin{pmatrix}
\text{undefined} \\
\text{CT3}\left(\dfrac{1}{2}\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1])\right) \\
\text{CT3}\left(\dfrac{1}{2}\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2])\right) \\
\text{undefined}
\end{pmatrix}
\right) = \\[1em]
&= \text{div}.
\begin{pmatrix}
\text{CT3}(g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1]) \\
\text{CT3}\left(\dfrac{1}{2}\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1])\right) \\
\text{CT3}\left(\dfrac{1}{2}\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2])\right) \\
\text{CT3}(g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3])
\end{pmatrix} = \\[1em]
&= \begin{pmatrix}
\dfrac{1}{J[1]}\ \begin{pmatrix}
\dfrac{1}{2}\ J[1.5]\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1]) -{} \\
J[0.5]\ g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb[1]
\end{pmatrix} \\
\dfrac{1}{J[2]}\ \begin{pmatrix}
\dfrac{1}{2}\ J[2.5]\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2]) -{} \\
\dfrac{1}{2}\ J[1.5]\ g³³[1.5]\ (K[1] + K[2])\ (h[2] - h[1])
\end{pmatrix} \\
\dfrac{1}{J[3]}\ \begin{pmatrix}
J[3.5]\ g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft[3] -{} \\
\dfrac{1}{2}\ J[2.5]\ g³³[2.5]\ (K[2] + K[3])\ (h[3] - h[2])
\end{pmatrix}
\end{pmatrix}
\end{align}

The derivative of the tendency with respect to θ is the matrix

\begin{align}
\dfrac{\partial θₜ}{\partial θ} &= \begin{pmatrix}
\dfrac{\partial θₜ[1]}{\partial θ[1]} &
\dfrac{\partial θₜ[1]}{\partial θ[2]} &
\dfrac{\partial θₜ[1]}{\partial θ[3]} \\
\dfrac{\partial θₜ[2]}{\partial θ[1]} &
\dfrac{\partial θₜ[2]}{\partial θ[2]} &
\dfrac{\partial θₜ[2]}{\partial θ[3]} \\
\dfrac{\partial θₜ[3]}{\partial θ[1]} &
\dfrac{\partial θₜ[3]}{\partial θ[2]} &
\dfrac{\partial θₜ[3]}{\partial θ[3]}
\end{pmatrix} = \\[1em]
&= \tiny\begin{pmatrix}
\begin{matrix}
\dfrac{J[1.5]\ g³³[1.5]}{2J[1]}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} -{} \\
\dfrac{J[0.5]}{J[1]}\ g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'[1]
\end{matrix} &
\dfrac{J[1.5]\ g³³[1.5]}{2J[1]}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} &
0 \\[1em]
\dfrac{J[1.5]\ g³³[1.5]}{2J[2]}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} &
\begin{matrix}
\dfrac{J[2.5]\ g³³[2.5]}{2J[2]}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} -{} \\
\dfrac{J[1.5]\ g³³[1.5]}{2J[2]}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])] h'[2]\end{pmatrix}
\end{matrix} &
\dfrac{J[2.5]\ g³³[2.5]}{2J[2]}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\[1em]
0 &
\dfrac{J[2.5]\ g³³[2.5]}{2J[3]}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} &
\begin{matrix}
\dfrac{J[3.5]}{J[3]}\ g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'[3] -{} \\
\dfrac{J[2.5]\ g³³[2.5]}{2J[3]}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix}
\end{matrix}
\end{pmatrix}
\end{align}

Now, suppose that

unit_CT3_field = CT3.(ones(Spaces.FaceExtrudedFiniteDifferenceSpace(axes(θ))))
interp_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(interp)
grad_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(grad)
set_matrix_boundary_rows = Operators.SetBoundaryOperator(;
    bottom = Operators.SetValue(UpperBidiagonalMatrixRow(C3(∂x³∂ξ³[0.5] Fb'(θ[1])))),
    top = Operators.SetValue(LowerBidiagonalMatrixRow(C3(∂x³∂ξ³[3.5] Ft'(θ[3])))),
)
div_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(div)

After this SDI is implemented, the preferred way to specify the derivative matrix will be

div_matrix.(unit_CT3_field) .⋅ set_matrix_boundary_rows(
    interp_matrix.(K'.(θ)) .* grad.(h.(θ)) .+
    interp.(K.(θ)) .* grad_matrix.(h'.(θ))
)

The first term in this matrix-matrix multiplication is

\text{div\_matrix}.(\text{unit\_CT3\_field}) = \begin{pmatrix}
-\dfrac{J[0.5]}{J[1]} & \dfrac{J[1.5]}{J[1]} & 0 & 0 \\
0 & -\dfrac{J[1.5]}{J[2]} & \dfrac{J[2.5]}{J[2]} & 0 \\
0 & 0 & -\dfrac{J[2.5]}{J[3]} & \dfrac{J[3.5]}{J[3]}
\end{pmatrix}

The second term roughly corresponds to the matrix

\begin{gather}
\text{set\_matrix\_boundary\_rows}.\begin{pmatrix}
\text{interp\_matrix}.(K'.(θ))\ .*\ \text{grad}.(h.(θ))\ .+ \\
\text{interp}.(K.(θ))\ .*\ \text{grad\_matrix}.(h'.(θ))
\end{pmatrix} =\\[1 em]
= \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix}
\begin{pmatrix}
\text{undefined} & \text{undefined} & \text{undefined} \\
\dfrac{1}{2}\ K'[1] & \dfrac{1}{2}\ K'[2] & 0 \\
0 & \dfrac{1}{2}\ K'[2] & \dfrac{1}{2}\ K'[3] \\
\text{undefined} & \text{undefined} & \text{undefined}
\end{pmatrix}\ .*\ \begin{pmatrix}
\text{undefined} \\
g³³[1.5]\ (h[2] - h[1]) \\
g³³[2.5]\ (h[3] - h[2]) \\
\text{undefined}
\end{pmatrix}\ .+ \\[1em]
\begin{pmatrix}
\text{undefined} \\
\dfrac{1}{2}\ (K[1] + K[2]) \\
\dfrac{1}{2}\ (K[2] + K[3]) \\
\text{undefined}
\end{pmatrix}\ .*\ \begin{pmatrix}
\text{undefined} & \text{undefined} & \text{undefined} \\
-g³³[1.5]\ h'[1] & g³³[1.5]\ h'[2] & 0 \\
0 & -g³³[2.5]\ h'[2] & g³³[2.5]\ h'[3] \\
\text{undefined} & \text{undefined} & \text{undefined}
\end{pmatrix}
\end{pmatrix} \normalsize=\\[1 em]
= \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix}
\begin{pmatrix}
\text{undefined} & \text{undefined} & \text{undefined} \\
\dfrac{g³³[1.5]}{2}\ K'[1]\ (h[2] - h[1]) & \dfrac{g³³[1.5]}{2}\ K'[2]\ (h[2] - h[1]) & 0 \\
0 & \dfrac{g³³[2.5]}{2}\ K'[2]\ (h[3] - h[2]) & \dfrac{g³³[2.5]}{2}\ K'[3]\ (h[3] - h[2]) \\
\text{undefined} & \text{undefined} & \text{undefined}
\end{pmatrix}\ .+ \\[1em]
\begin{pmatrix}
\text{undefined} & \text{undefined} & \text{undefined} \\
-\dfrac{g³³[1.5]}{2}\ (K[1] + K[2])\ h'[1] & \dfrac{g³³[1.5]}{2}\ (K[1] + K[2])\ h'[2] & 0 \\
0 & -\dfrac{g³³[2.5]}{2}\ (K[2] + K[3])\ h'[2] & \dfrac{g³³[2.5]}{2}\ (K[2] + K[3])\ h'[3] \\
\text{undefined} & \text{undefined} & \text{undefined}
\end{pmatrix}
\end{pmatrix} \normalsize=
\end{gather}

\begin{gather}
= \text{set\_matrix\_boundary\_rows}.\tiny\begin{pmatrix}
\text{undefined} & \text{undefined} & \text{undefined} \\
\dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} &
\dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} &
0 \\
0 &
\dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} &
\dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\
\text{undefined} & \text{undefined} & \text{undefined}
\end{pmatrix} \normalsize= \\[1em]
= \tiny\begin{pmatrix}
g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'(θ[1]) & 0 & 0 \\
\dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[1]\ (h[2] - h[1]) -{} \\ (K[1] + K[2])\ h'[1]\end{pmatrix} &
\dfrac{g³³[1.5]}{2}\begin{pmatrix}K'[2]\ (h[2] - h[1]) +{} \\ (K[1] + K[2])\ h'[2]\end{pmatrix} &
0 \\
0 &
\dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[2]\ (h[3] - h[2]) -{} \\ (K[2] + K[3])\ h'[2]\end{pmatrix} &
\dfrac{g³³[2.5]}{2}\begin{pmatrix}K'[3]\ (h[3] - h[2]) +{} \\ (K[2] + K[3])\ h'[3]\end{pmatrix} \\
0 & 0 & g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'(θ[3])
\end{pmatrix}
\end{gather}

Finally, we can also rewrite the previous example without using SetBoundaryOperator. We can do this by moving the boundary conditions from set_boundary_fluxes to interp and grad, and by moving the corresponding matrix boundary conditions from set_matrix_boundary_rows to interp_matrix and grad_matrix. For example, we could set

interp = Operators.InterpolateC2F(;
    bottom = Operators.SetValue(1),
    top = Operators.SetValue(1),
)
grad = Operators.GradientC2F(;
    bottom = Operators.SetGradient(C3(∂x³∂ξ³[0.5] Fb(θ[1]))),
    top = Operators.SetGradient(C3(∂x³∂ξ³[3.5] Ft(θ[3]))),
)

We could then obtain the same tendency of θ by evaluating

θₜ = \text{div}.(\text{interp}.(K.(θ))\  .*\  \text{grad}.(h.(θ)))

The preferred way to express the derivative matrix would then be

div_matrix.(unit_CT3_field) .⋅ (
    interp_matrix.(K'.(θ)) .* grad.(h.(θ)) .+
    interp.(K.(θ)) .* grad_matrix.(h'.(θ))
)

In this case, the operator matrices would be

interp_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(
    interp;
    bottom = Operators.SetValue(UpperBidiagonalMatrixRow(0)),
    top = Operators.SetValue(LowerBidiagonalMatrixRow(0)),
)
grad_matrix = Operators.FiniteDifferenceOperatorTermsMatrix(
    grad;
    bottom = Operators.SetValue(UpperBidiagonalMatrixRow(C3(∂x³∂ξ³[0.5] Fb'(θ[1])))),
    top = Operators.SetValue(LowerBidiagonalMatrixRow(C3(∂x³∂ξ³[3.5] Ft'(θ[3])))),
)

The second term in the matrix-matrix multiplication would then be

\begin{gather}
\text{interp\_matrix}.(K'.(θ))\ .*\ \text{grad}.(h.(θ))\ .+\ \text{interp}.(K.(θ))\ .*\ \text{grad\_matrix}.(h'.(θ)) =\\[1 em]
= \tiny\begin{matrix}
\begin{pmatrix}
0 & 0 & 0 \\
\dfrac{1}{2}\ K'[1] & \dfrac{1}{2}\ K'[2] & 0 \\
0 & \dfrac{1}{2}\ K'[2] & \dfrac{1}{2}\ K'[3] \\
0 & 0 & 0
\end{pmatrix}\ .*\ \begin{pmatrix}
g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb(θ[1]) \\
g³³[1.5]\ (h[2] - h[1]) \\
g³³[2.5]\ (h[3] - h[2]) \\
g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft(θ[3])
\end{pmatrix}\ .+ \\[1em]
\begin{pmatrix}
1 \\
\dfrac{1}{2}\ (K[1] + K[2]) \\
\dfrac{1}{2}\ (K[2] + K[3]) \\
1
\end{pmatrix}\ .*\ \begin{pmatrix}
g³³[0.5]\ ∂x³∂ξ³[0.5]\ Fb'(θ[1]) & 0 & 0 \\
-g³³[1.5]\ h'[1] & g³³[1.5]\ h'[2] & 0 \\
0 & -g³³[2.5]\ h'[2] & g³³[2.5]\ h'[3] \\
0 & 0 & g³³[3.5]\ ∂x³∂ξ³[3.5]\ Ft'(θ[3])
\end{pmatrix}
\end{matrix}
\end{gather}

In order to simplify things, we will have operator matrices contain 0's at the boundaries by default, so the boundary conditions for interp_matrix will not actually need to be specified.

@tapios @charleskawczynski I have made the following updates to the SDI based on your comments:

Split the tasks into more incremental milestones, each of which will take at most a week (if there are no unforeseen issues).
Add a specific task for porting over the current "Schur complement solve" from ClimaAtmos, and modify the SDI to explain why this is necessary.
Add a specific task for setting up a test suite that will monitor correctness and performance.

After a lengthy discussion with Simon and Sriharsha today, it has become clear that the "permuted band matrix solve" algorithm will require a fair bit of work to ensure that it is performant and GPU-compatible. However, all of the tasks that come before the implementation of this new algorithm (up to testing the interface in ClimaAtmos) should be good as-is. Per the revised time estimates, these tasks should take 3--4 weeks, which should be enough time for us to settle on a decent implementation of the new algorithm. If I start later this week, this means that the interface will be tested in ClimaAtmos by around this time in June.

@simonbyrne @sriharshakandala Here is a list of all the matrix sparsity patterns that BlockMatrixSystemSolver will need to support in the near future. For each case, I will specify the sparsity pattern of $J = \partial Y_t/\partial Y$, where $Y_t$ is the implicit tendency of $Y$, though the actual matrix required by the implicit solver will be $W = -I + \Delta t \gamma J$, where $I$ is the identity matrix and $\Delta t \gamma$ is a scalar. I will also use the following shorthands:

$\mathbb{D}$ refers to a diagonal matrix field
$\mathbb{B}$ refers to a bidiagonal matrix field
$\mathbb{T}$ refers to a tridiagonal matrix field
$\mathbb{Q}$ refers to a quaddiagonal matrix field
$\mathbb{P}$ refers to a pentadiagonal matrix field
$u3\\text{data}$ refers to $u_3.\text{components}.\text{data}.{:}1$

For ClimaLSM, we only need to support the sparsity pattern of the matrix specified in the comment above,

J = \begin{pmatrix} \dfrac{\partial Y_t.c.\theta}{\partial Y.c.\theta} \end{pmatrix} = \begin{pmatrix} \mathbb{T} \end{pmatrix}

In the future, ClimaLSM will also include the prognostic variable $Y.c.\psi$, but this is still several months away.

For the ClimaAtmos dycore, the sparsity pattern of $J$ with $T$ tracers (and no FCT, just upwinding or central differencing) is

J = \begin{pmatrix}
\dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho e_{tot}}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_1}{\partial Y.f.u_3\_\text{data}} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.c.\rho\chi_T}{\partial Y.f.u_3\_\text{data}} \\
\dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho e_{tot}} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_1} & \cdots & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.c.\rho\chi_T} & \dfrac{\partial Y_t.f.u_3\_\text{data}}{\partial Y.f.u_3\_\text{data}}
\end{pmatrix} =

= \begin{pmatrix}
\mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{T}/\mathbb{P} & \mathbb{T}/\mathbb{P} & 0/\mathbb{T}/\mathbb{P} & \cdots & 0/\mathbb{T}/\mathbb{P} & \mathbb{Q} \\
\mathbb{T}/\mathbb{P} & 0 & \mathbb{T}/\mathbb{P} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\mathbb{T}/\mathbb{P} & 0 & 0 & \cdots & \mathbb{T}/\mathbb{P} & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0/\mathbb{B} & \cdots & 0/\mathbb{B} & \mathbb{T}
\end{pmatrix}

However, instead of using the exact value of $J$, we approximate it so that its sparsity pattern simplifies to

J \approx \begin{pmatrix}
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix}

For AMIP, we only need to support $T = 1$, with $\chi = q{tot}$. After AMIP, though, we will need to support $T \gg 1$, with $q{liq}$, $q{ice}$, $q{rai}$, $q_{sno}$, and every aerosol represented by a prognostic variable. So, BlockMatrixSystemSolver needs to perform well for the dycore with large values of $T$. At the moment, the best algorithm we have for this is the "Schur complement solve".

When we add EDMF to ClimaAtmos, we end up with a significantly more complicated Jacobian. We are not entirely certain of how sparse we can afford to make our approximation of $J$, but our best guess at the moment (with only one draft in in the EDMF model, which corresponds to only one sub-grid-scale copy of the grid-scale state) is

\begin{gather}
J = \begin{pmatrix}
\dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\
\dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} & \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}
\end{pmatrix},\ \text{where}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{GS State}} \approx 0, \\[1 em]
\dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} \approx \begin{pmatrix}
0 & 0 & 0 & \cdots & 0 & \mathbb{B} \\
0 & \mathbb{T} & 0 & \cdots & 0 & \mathbb{B} \\
0 & 0 & \mathbb{T} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & \mathbb{T} & \mathbb{B} \\
\mathbb{B} & \mathbb{B} & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix},\ \text{and}\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}} \approx \begin{pmatrix}
\mathbb{D} & 0 & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{D} & \mathbb{D} & 0 & \cdots & 0 & \mathbb{B} \\
\mathbb{D} & 0 & \mathbb{D} & \cdots & 0 & \mathbb{B} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
\mathbb{D} & 0 & 0 & \cdots & \mathbb{D} & \mathbb{B} \\
\mathbb{B} & 0 & 0 & \cdots & 0 & \mathbb{T}
\end{pmatrix}
\end{gather}

Since we plan to approximate the bottom-left matrix block as 0, our strategy for solving the linear problem $W\ \Delta Y = b$ will be to first solve a linear problem that corresponds to the bottom-right block, and to then solve another linear problem that corresponds to the top-left block. That is to say,

\begin{gather}
W\ \Delta Y = b \implies \\[1 em]
\begin{pmatrix}
-I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}} & \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}} \\
0 & -I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}
\end{pmatrix} \begin{pmatrix} \Delta\text{ GS State} \\ \Delta\text{ SGS State} \end{pmatrix} = \begin{pmatrix} \text{GS RHS} \\ \text{SGS RHS} \end{pmatrix} \implies \\[1 em]
\left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{SGS Tendency}}{\partial\ \text{SGS State}}\right)\ \Delta\text{ SGS State} = \text{SGS RHS}\ \text{and} \\
\left(-I + \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{GS State}}\right)\ \Delta\text{ GS State} = \text{GS RHS} - \Delta t\gamma\ \dfrac{\partial\ \text{GS Tendency}}{\partial\ \text{SGS State}}\ \Delta\text{ SGS State}
\end{gather}

This means that our implicit solver will first compute changes at the sub-grid-scale ($\Delta\text{ SGS State}$), and then it will compute changes at the grid-scale ($\Delta\text{ GS State}$). We do not yet know how we will approximate the top-right matrix block of the Jacobian, so we do not know what its sparsity structure will be. Fortunately, this doesn't actually matter, since that block will only be used for computing a matrix-vector product, rather than for solving a linear problem.

In the near future, we only plan to use EDMF with $T = 1$ or $T \approx 1$, since there is still a fair bit of work that needs to be done before we are ready to test it with large numbers of tracers. So, we can afford to have BlockMatrixSystemSolver scale poorly with $T$ when we use EDMF, which justifies using the "permuted matrix solve" algorithm proposed in this SDI.

I appreciate your thoroughness, @dennisYatunin!

One important simplification, though: In the dycore, we can treat all tracers fully explicitly, so T=0 is fine, or, if you want to include moisture, T=1 (but even that is not strictly necessary). We only need to treat the fast waves (acoustic and gravity) implicitly, and tracers play no role in them.

@tapios Thanks! We actually stopped making that assumption in ClimaAtmos sometime last year. I think the intent was to ensure that all forms of moisture get treated roughly the same way as energy in the implicit solve, but neither Daniel nor Zhaoyi nor I remember how significant the resulting timestep increase was. Should we revert to making that approximation, or add a flag to toggle between the two options? In either case, the "Schur complement solve" scales very well with $T$, so whether or not the vertical advection of tracers is treated implicitly should not have a significant impact on performance.

@dennisYatunin It's ok to treat the moisture tracers (q_t, q_l, q_i) implicitly and in the same way as energy. But no need to do the same for other tracers.

@dennisYatunin : Is there a reason we are deviating from the more traditional band storage format (https://www.ibm.com/docs/en/essl/6.2?topic=representation-blas-general-band-storage-mode) for storing band matrices, which essentially stores the bands? We can use an offset array and eliminate storing the "unused data" for off-diagonal bands!

@sriharshakandala We are indeed storing the matrices in a traditional band storage format, except we are using "row-major" storage instead of "column-major" storage (the link you sent uses "column-major" storage). We are storing the matrices row-by-row in order to encode them as Fields, and we only store the nonzero entries in each row. The top and bottom rows are exceptions to this; since they have fewer nonzero entries than the interior rows, we need to pad them with several "dummy entries" that do not lie in the matrix (these are equivalent to the *s in the link you sent). Also, we want our matrices to be Fields so that we can include them in Field broadcasts, which makes it much easier to specify Jacobian blocks.

Thanks, @dennisYatunin. For the quaddiagonal and pentadiagonal matrices, I am assuming we are either losing them in the Jacobian matrix approximation or the bands are located contiguously!

@tapios Time estimates have been updated!

Thanks, @dennisYatunin. Looking good!

CliMA / ClimaCore.jl