ddemidov / amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method
http://amgcl.readthedocs.org/
MIT License
728 stars 111 forks source link

Pressure projection solver for Incompressible Navier-Stokes FEM #151

Closed xgarnaud closed 4 years ago

xgarnaud commented 4 years ago

Hi,

I am trying to use the pressure projection solver for a matrix coming from a SUPG-PSPG FEM discretization of the incompressible Navier Stokes equations (3D). The DOFs are (u0, v0, w0, p0, u1, v1, w1, p1, ...).

I am struggling to find the correct setup for the solver in this case. Here are the matrix & RHS after some iterations, as the solver converges to a steady solution:

https://drive.google.com/file/d/1giG4ETr4rbc2PoJ56zWbrf71trsgYMY0/view?usp=sharing https://drive.google.com/file/d/1LkEHaOg5_WV7672qXS8oFaQvfTJRzJua/view?usp=sharing

For this matrix & RHS, Eigen's BiCGStab/ILU converges in 12 iterations and 0.8s while AMCGL takes 15 iterations and 7s. I must be doing something wrong! I tried using the settings from https://github.com/ddemidov/amgcl_benchmarks/blob/master/shared_mem/navier-stokes/schur.cpp (see below) which brings the time down to 2.1s but the number of iterations up to 25. I must still be doing something wrong... Could you please point me to the correct setup?

Code:

#include "Eigen/Sparse"
#include <iostream>
#include <unsupported/Eigen/SparseExtra>

#include <amgcl/adapter/crs_tuple.hpp>
#include <amgcl/adapter/eigen.hpp>
#include <amgcl/amg.hpp>
#include <amgcl/backend/builtin.hpp>
#include <amgcl/coarsening/smoothed_aggregation.hpp>
#include <amgcl/make_solver.hpp>
#include <amgcl/preconditioner/schur_pressure_correction.hpp>
#include <amgcl/relaxation/as_preconditioner.hpp>
#include <amgcl/relaxation/damped_jacobi.hpp>
#include <amgcl/relaxation/ilu0.hpp>
#include <amgcl/solver/bicgstab.hpp>
#include <amgcl/solver/fgmres.hpp>

#include <amgcl/coarsening/runtime.hpp>
#include <amgcl/preconditioner/runtime.hpp>
#include <amgcl/relaxation/runtime.hpp>
#include <amgcl/solver/runtime.hpp>

#include <boost/property_tree/json_parser.hpp>

#include <chrono>

typedef Eigen::SparseMatrix<double, Eigen::RowMajor> SparseMatrix;
typedef Eigen::VectorXd Vector;

typedef Eigen::BiCGSTAB<SparseMatrix, Eigen::IncompleteLUT<double>>
    EigenBiCGStab;

typedef amgcl::backend::crs<double> AMGCLSparseMatrix;

typedef amgcl::backend::builtin<double> Backend;

typedef amgcl::make_solver<amgcl::runtime::preconditioner<Backend>,
                           amgcl::runtime::solver::wrapper<Backend>>
    _PSolver;

typedef amgcl::make_solver<amgcl::runtime::preconditioner<Backend>,
                           amgcl::runtime::solver::wrapper<Backend>>
    _USolver;

typedef amgcl::make_solver<
    amgcl::preconditioner::schur_pressure_correction<_USolver, _PSolver>,
    amgcl::runtime::solver::wrapper<Backend>>
    AMGCLSolver;

typedef std::chrono::time_point<std::chrono::system_clock> timer_t;

AMGCL_USE_EIGEN_VECTORS_WITH_BUILTIN_BACKEND();

void solve_eigen(const SparseMatrix &mat, const Vector &rhs, double tol) {

  Vector sol(mat.rows()), res;
  sol.setZero();

  EigenBiCGStab eigen_solver;
  eigen_solver.setTolerance(tol);
  eigen_solver.preconditioner().setFillfactor(3);
  eigen_solver.compute(mat);

  sol.setZero();
  sol = eigen_solver.solveWithGuess(rhs, sol);

  int niter = eigen_solver.iterations();
  double error = eigen_solver.error();

  res = mat * sol - rhs;
  std::cout << "Eigen: niter = " << niter << ", error = " << error
            << ", res = " << res.norm() / rhs.norm() << std::endl;

  Eigen::Map<Eigen::VectorXd, 0, Eigen::InnerStride<4>> res_p(res.data() + 3,
                                                              mat.rows() / 4);
  std::cout << res_p.norm() << std::endl;
}

void solve_amgcl(const SparseMatrix &mat, const Vector &rhs, double tol) {

  Vector sol(mat.rows()), res;
  sol.setZero();
  const int *ptr = mat.outerIndexPtr();
  const int *col = mat.innerIndexPtr();
  const double *val = mat.valuePtr();

  AMGCLSparseMatrix amgcl_mat(std::make_tuple(
      mat.rows(), amgcl::make_iterator_range(ptr, ptr + mat.rows() + 1),
      amgcl::make_iterator_range(col, col + ptr[mat.rows()]),
      amgcl::make_iterator_range(val, val + ptr[mat.rows()])));
  boost::property_tree::ptree prm;
  // read_json("extra_params.json", prm);
  prm.put("solver.tol", tol);
  prm.put("precond.pmask_size", mat.rows());
  prm.put("precond.pmask_pattern", "%3:4");
  AMGCLSolver amgcl_solver(amgcl_mat, prm);
  sol.setZero();
  int niter;
  double error;
  std::tie(niter, error) = amgcl_solver(rhs, sol);

  res = mat * sol - rhs;
  std::cout << "AMGCL: niter = " << niter << ", error = " << error
            << ", res = " << res.norm() / rhs.norm() << std::endl;

  Eigen::Map<Eigen::VectorXd, 0, Eigen::InnerStride<4>> res_p(res.data() + 3,
                                                              mat.rows() / 4);
  std::cout << res_p.norm() << std::endl;
}

int main(int argc, const char *argv[]) {

  const timer_t t0 = std::chrono::system_clock::now();
  SparseMatrix mat;
  Eigen::loadMarket(mat, "taylor_couette_mat.mm");
  Vector rhs, sol, res;
  Eigen::loadMarketVector(rhs, "taylor_couette_rhs.mm");

  const double tol = 1e-10;

  const timer_t t1 = std::chrono::system_clock::now();
  solve_eigen(mat, rhs, tol);
  const timer_t t2 = std::chrono::system_clock::now();
  solve_amgcl(mat, rhs, tol);
  const timer_t t3 = std::chrono::system_clock::now();

  const double t_read =
      std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count();
  const double t_eigen =
      std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
  const double t_amgcl =
      std::chrono::duration_cast<std::chrono::milliseconds>(t3 - t2).count();

  std::cout << "read : " << t_read << " ms" << std::endl;
  std::cout << "eigen : " << t_eigen << " ms" << std::endl;
  std::cout << "amgcl : " << t_amgcl << " ms" << std::endl;

  return 1;

Extra parameters


{
    "solver": {
        "type": "fgmres",
        "M": "50",
        "maxiter": "25"
    },
    "precond": {
        "pmask_pattern": "%3:4",
        "usolver": {
            "solver": {
                "type": "bicgstab",
                "tol": "1e-9",
                "maxiter": "25"
            },
            "precond": {
                "class": "relaxation",
                "type": "damped_jacobi"
            }
        },
        "psolver": {
            "solver": {
                "type": "fgmres",
                "tol": "1e-9",
                "maxiter": "25"
            },
            "precond": {
                "class": "amg",
                "relax": {
                    "type": "spai0"
                },
                "coarsening": {
                    "type": "smoothed_aggregation"
                }
            }
        }
    }
}
ddemidov commented 4 years ago

I'll have a look at the matrix. Meanwhile, #144 could be of help to you.

xgarnaud commented 4 years ago

Thank you. Contrary to that case, here the diagonal entries are non zero due to the stabilization terms.

Based on #144 I used

{
    "solver": {
        "type": "fgmres",
        "M": "50",
        "maxiter": "50"
    },
    "precond": {
        "pmask_pattern": "%3:4",
        "usolver": {
            "solver": {
                "type": "bicgstab",
                "tol": "1e-9",
                "maxiter": "1"
            },
            "precond": {
                "class": "relaxation",
                "type": "ilu0"
            }
        },
        "psolver": {
            "solver": {
                "type": "bicgstab",
                "tol": "1e-9",
                "maxiter": "1"
            },
            "precond": {
                "class": "amg",
                "relax": {
                    "type": "ilu0"
                },
                "coarsening": {
                    "type": "smoothed_aggregation"
                }
            }
        }
    }
}

Convergence takes .4s and 36 iterations, which is much better.

ddemidov commented 4 years ago

Ok, so it looks like you could just use the monolithic approach here (simply setup the amg preconditioner for the whole system):

$ ./solver -A taylor_couette_mat.mm -f taylor_couette_rhs.mm
Solver
======
Type:             BiCGStab
Unknowns:         15504
Memory footprint: 847.88 K

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.02
Grid complexity:     1.02
Memory footprint:    14.15 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        15504         765504     13.86 M (98.38%)
    1          331          12643    294.49 K ( 1.62%)

Iterations: 38
Error:      9.71585e-09

[Profile:       1.063 s] (100.00%)
[  reading:     0.833 s] ( 78.38%)
[  setup:       0.017 s] (  1.60%)
[  solve:       0.212 s] ( 19.99%)

the performance is a bit better if you tell it that the system has 4x4 block structure:

$ ./solver -A taylor_couette_mat.mm -f taylor_couette_rhs.mm -p precond.coarsening.aggr.block_size=4
Solver
======
Type:             BiCGStab
Unknowns:         15504
Memory footprint: 847.88 K

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.03
Grid complexity:     1.03
Memory footprint:    14.78 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        15504         765504     14.31 M (97.15%)
    1          420          22448    482.22 K ( 2.85%)

Iterations: 25
Error:      9.12836e-09

[Profile:       0.999 s] (100.00%)
[  reading:     0.847 s] ( 84.70%)
[  setup:       0.018 s] (  1.77%)
[  solve:       0.135 s] ( 13.48%)

You can also use the Schur pressure correction. Here I am using the mixed-precision one:

./schurpc_mixed -A taylor_couette_mat.mm -f taylor_couette_rhs.mm -P prm.json 
Solver
======
Type:             FGMRES(30)
Unknowns:         15504
Memory footprint: 7.34 M

Preconditioner
==============
Schur complement (two-stage preconditioner)
  Unknowns: 15504(3876)
  Nonzeros: 765504
  Memory:  24.92 M

[ U ]
Solver
======
Type:             BiCGStab
Unknowns:         11628
Memory footprint: 317.95 K

Preconditioner
==============
Relaxation as preconditioner
  Unknowns: 11628
  Nonzeros: 430596
  Memory:   10.27 M

[ P ]
Solver
======
Type:             BiCGStab
Unknowns:         3876
Memory footprint: 105.98 K

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.23
Grid complexity:     1.10
Memory footprint:    1.07 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         3876          47844    943.90 K (81.14%)
    1          384          11120    154.64 K (18.86%)

Iterations: 5
Error:      7.67882e-09

[Profile:                1.103 s] (100.00%)
[  reading:              0.901 s] ( 81.71%)
[  schur_complement:     0.201 s] ( 18.26%)
[    setup:              0.025 s] (  2.30%)
[    solve:              0.176 s] ( 15.93%)

Again, the performance may be improved if you use 3x3 block values for the U subsystem:

./schurpc_mixed -A taylor_couette_mat.mm -f taylor_couette_rhs.mm -P prm.json --ub 3
Solver
======
Type:             FGMRES(30)
Unknowns:         15504
Memory footprint: 7.34 M

Preconditioner
==============
Schur complement (two-stage preconditioner)
  Unknowns: 15504(3876)
  Nonzeros: 765504
  Memory:  18.81 M

[ U ]
Solver
======
Type:             BiCGStab
Unknowns:         3876
Memory footprint: 317.95 K

Preconditioner
==============
Relaxation as preconditioner
  Unknowns: 3876
  Nonzeros: 47844
  Memory:   4.15 M

[ P ]
Solver
======
Type:             BiCGStab
Unknowns:         3876
Memory footprint: 105.98 K

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.23
Grid complexity:     1.10
Memory footprint:    1.07 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         3876          47844    943.90 K (81.14%)
    1          384          11120    154.64 K (18.86%)

Iterations: 5
Error:      7.67883e-09

[Profile:                0.963 s] (100.00%)
[  reading:              0.873 s] ( 90.73%)
[  schur_complement:     0.089 s] (  9.25%)
[    setup:              0.011 s] (  1.19%)
[    solve:              0.077 s] (  8.03%)

The important numbers to look at in all the tests above are setup and solve times. Looks like the last test has the best performance with 0.011s for setup and 0.077s for solve (or 0.089s total solution time).

ddemidov commented 4 years ago

Forgot to attach the prm.json with the SchurPC parameters:

{
    "solver": {
        "type": "fgmres"
    },
    "precond": {
        "pmask_size": 15504,
        "pmask_pattern": "%3:4",
        "approx_schur": false,
        "adjust_p": 1,
        "psolver": {
            "solver": {
                "type": "bicgstab",
                "maxiter": 8,
                "tol": 1e-1
            }
        },
        "usolver": {
            "solver": {
                "type": "bicgstab",
                "maxiter": 2,
                "tol": 1e-1
            },
            "precond": {
                "class": "relaxation",
                "type": "ilu0"
            }
        }
    }
}
ddemidov commented 4 years ago

Contrary to that case, here the diagonal entries are non zero due to the stabilization terms.

(#144 eventually also converged to non-zero pressure diagonal)

ddemidov commented 4 years ago

Just to note, that the solution time of the monolithic solver is better with ilu0 relaxation, but the setup time suffers, so it is basically a draw with the second test above:

$ ./solver -A taylor_couette_mat.mm -f taylor_couette_rhs.mm \
     -p precond.coarsening.aggr.block_size=4 precond.relax.type=ilu0
Solver
======
Type:             BiCGStab
Unknowns:         15504
Memory footprint: 847.88 K

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.03
Grid complexity:     1.03
Memory footprint:    26.77 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        15504         765504     26.30 M (97.15%)
    1          420          22448    482.22 K ( 2.85%)

Iterations: 7
Error:      8.92549e-09

[Profile:       0.965 s] (100.00%)
[  reading:     0.832 s] ( 86.24%)
[  setup:       0.051 s] (  5.30%)
[  solve:       0.081 s] (  8.42%)
ddemidov commented 4 years ago

And for the problem of this size, single level ilu0 (with 4x4 block values) seems to be the winner:

$ ./solver -A taylor_couette_mat.mm -f taylor_couette_rhs.mm -1 -b4 precond.type=ilu0
Solver
======
Type:             BiCGStab
Unknowns:         3876
Memory footprint: 847.88 K

Preconditioner
==============
Relaxation as preconditioner
  Unknowns: 3876
  Nonzeros: 47844
  Memory:   12.55 M

Iterations: 16
Error:      8.8943e-09

[Profile:       0.880 s] (100.00%)
[  reading:     0.825 s] ( 93.80%)
[  setup:       0.008 s] (  0.88%)
[  solve:       0.046 s] (  5.26%)
xgarnaud commented 4 years ago

Thank you very much.