geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
223 stars 235 forks source link

Improve matrix-free performance marginally. #5570

Open bangerth opened 8 months ago

bangerth commented 8 months ago

@tjhei This is one for you, pointed out to me by @marcfehling .

In stokes_matrix_free.cc, we apply the block preconditioner in the following way (abridged):

    void
    BlockSchurGMGPreconditioner<StokesMatrixType, ABlockMatrixType, SchurComplementMatrixType,
                                ABlockPreconditionerType, SchurComplementPreconditionerType>::
                                vmult (dealii::LinearAlgebra::distributed::BlockVector<double>       &dst,
                                       const dealii::LinearAlgebra::distributed::BlockVector<double>  &src) const
    {
      dst = 0.0;    // set all components of dst to zero

      // either solve with the Schur complement matrix (if do_solve_Schur_complement==true)
      // or just apply one preconditioner sweep (for the first few
      // iterations of our two-stage outer GMRES iteration)
      if (do_solve_Schur_complement)
        {
             ...
             dst.block(1) = S^{-1} src.block(1);
             ...
        }
      else
        {
          ...same as above, but only one preconditioner application instead of S^{-1}...
        }

      dst.block(1) *= -1.0;

      {
        // the u-block of dst only contains zeros
        stokes_matrix.vmult(utmp, dst); // B^T
        utmp.block(0) *= -1.0;
        utmp.block(0) += src.block(0);
      }

The key piece is that at the beginning of this last block, dst.block(0) is zero and so what ends up in utmp is really just the product of the top right and bottom right block of the Stokes matrix with dst.block(1). The top left and bottom left blocks of the matrix are multiplied by zeros.

The inefficiency now is that the vast majority of the work done in the stokes matrix (or stokes operator) is associated with the top left block, which is unnecessary here. It would be nice to either restrict operation to individual blocks, or to create a separate matrix-free object that only represents the top right and bottom right blocks.

tjhei commented 8 months ago

Yes, that is correct. I was aware of it when Conrad implemented it, but forgot afterwards. This might make a noticeable improvement, who knows.

I'll let one of my students try it.