Considerable amount of time spent at setting FABs to value

ankithadas commented 11 months ago

Hey everyone

I am currently writing a steady state Navier Stokes solver using AMReX. I am using MLABecLaplacian for solving the pressure correction Eq and a modified version of MLABecLaplacian for solving the advection equation. Running a simple Lid Driven Cavity example (512x512 Re = 400) shows that considerable amount of time is spent in setting FABs to zero (or some constant value)

Iteration 24290
--------------------------
Momentum Residual 9.999500545711633e-07
Pressure Correction Residual 7.237848774032368e-08
Writing plotfile plt24290 at iteration 24290

PROGRAM END

Total Timers     =     604.53 seconds.

Total times -----------------------------------------------------------------------------------------------------------------------
                                Function Name      NCalls         Min         Avg         Max      StdDev    CoeffVar     Percent %
                MLAConBecLaplacian::Fsmooth()     1554560    183.7289    185.6640    188.2162      1.6723      0.9007       30.71 %
                   MLABecLaplacian::Fsmooth()     6018912     54.8052     55.5461     56.8950      0.8069      1.4527        9.19 %
                    MLABecLaplacian::Fapply()     2184252     50.1850     50.4435     50.7884      0.2447      0.4850        8.34 %
                           FabArray::setVal()     4062522     47.4373     48.2097     48.6335      0.4605      0.9553        7.97 %
                           average_down_faces     3157700     40.9653     41.2552     41.5962      0.2263      0.5486        6.82 %
                                amrex::Copy()     1975265     24.1153     24.2010     24.3882      0.1090      0.4506        4.00 %
    NavierStokes::computePressureCorrection()       24290     23.4314     23.6842     24.1424      0.2734      1.1542        3.92 %
                             FabArray::Xpay()     1608669     15.5913     15.6465     15.7702      0.0732      0.4677        2.59 %
                        FillBoundary_nowait()     8970897     14.2315     15.0656     15.6218      0.5346      3.5482        2.49 %
                MLAConBecLaplacian::getDiag()       24290     14.6383     14.7253     14.7666      0.0518      0.3518        2.44 %
                          FabArray::norminf()     1118956     13.4672     13.5213     13.6034      0.0511      0.3781        2.24 %
        NavierStokes::calculateFaceVelocity()       24290     12.7456     12.9070     13.0002      0.1014      0.7854        2.14 %
                          amrex::average_down     1308196     11.6049     11.6502     11.7574      0.0622      0.5337        1.93 %
                        FillBoundary_finish()     8970897      8.0292     11.0914     14.3701      2.2508     20.2934        1.83 %
    NavierStokes::calculatePressureGradient()       24290     10.3765     10.4440     10.5954      0.0885      0.8472        1.73 %
              FabArray::ParallelCopy_finish()     3041781      1.5244      8.9432     11.7171      4.2911     47.9815        1.48 %

Upon review, it looks like MLMG repeatedly sets FABs to zero multiple times even though some of them are not required. Removing some of the setVals in MLMG does reduce the total call count (tested with Debug build with NaN fill).

My overall solver structure is

Solve Advection()
         Set Up MLAConBecLaplacian (Modified Advection solver) 
         Compute Diagonal of Matrix 
         Compute Advection Source terms 
         Under Relax Matrix
         MLMG::solve
Update face velocities 
Solve Pressure Correction
         amrex::computeDivergence
         Set up MLABecLaplacian 
         MLMG::solve
         Update Pressure Field 
         Update velocity Field 
Repeat

I hope that gives a brief idea of my code structure (happy to elaborate more).

Few areas we could cut down the number of times setVal is called in MLMG are

MLMG::prepareForSolve() . Here the residual and correction fields are set to zero. I believe this is unnecessary as these fields are either filled in or set to zero in other areas of MLMG solver
MLCGSolverT::solve_bicgstab(). Here MF ph, sh are set to zero. However, these fields are set using LocalCopy later in the iteration loop. Also MF sol is already set to zero at MLMGT::actualBottomSolve ()

Making these modifications does reduce the overall call count (I will put up performance comparison soon) considerably. I believe there are still many cases where setVal is unnecessarily called.

Please let me know your thoughts

eebasso commented 9 months ago

Hi Ankith,

I have been working on optimizing MLCGSolverT::solve_bicgstab() and I found that the initial setVal(RT(0.0)) are necessary to avoid errors. I believe this is because it fills the grow/ghost cells as well, whereas the LocalCopy operations may not. As an example, MF p = Lp.make(amrlev,mglev,sol.nGrowVect()) will initialize p with the same grow/ghost cells as sol. If we do p.LocalCopy(r,0,ncomp,IntVect(0)) this will fill in p with values from r except for the grow/ghost cells if sol.nGrowVect() is nonzero. The boundary cells will be undefined.

ankithadas commented 9 months ago

Hey Eebaso,

That makes a bit of sense. I believe your last PR as cut down considerable amount of setVal already. However, I think there are still areas we could optimize further.

MLMG class calls bottomSolveWithCG after setting x to zero (line number 1450). That means there is no need to store sorig and compute initial residual r in MLCGSolver::solve_bicgstab and solve_cg
The next of is a rather big change in the linear solver class. We could avoid making copies of vectors if MLLinOp::apply took the input vector x as const. This would require changing the way the matrix multiplication operations are performed when boundary conditions are inhomogeneous. One way is we change b vector for inhomogeneous boundary conditions.

Again, thank you for your response. Looking forward to hear your thoughts

AMReX-Codes / amrex

Considerable amount of time spent at setting FABs to value #3593