Possibility of block solver for CPR?

moyner commented 5 years ago

Hi,

This is more of a feature request than an actual issue, but I was wondering if it is possible to combine the CPR solvers with the block-solvers. Say that I have a block system where the entries are amgcl::static_matrix<double, B, B> and I can currently solve that with e.g. block ILU as relaxation with AMGCL. If the problem is a mixed variable type, I would like to use CPR with AMG for the pressure part, but that would only be a scalar problem of double. We have a few solvers which produce zero pivots when treated as a scalar system, but where the blocks themselves are invertible and can be solved by a block solver.

A related issue that would be a useful would be to have the possibility of leaving the AMG hierarchy intact between calls to CPR, while updating the second stage solver with an updated matrix.

Thanks for all the great work on AMGCL - it really is a fantastic library!

ddemidov commented 5 years ago

Hello Olav,

Both requests make sense to me. I think using block solver for the top-level (second-stage) preconditioner should be possible after minor modifications, along the same line it is done in schur_pressure_correction solver. I'll try to make it work in CPR.

Changing the top-level preconditioner should just be a matter of adding an interface call. Something like template <class Matrix> void update_sprecond(const Matrix &K), which should just do this:

https://github.com/ddemidov/amgcl/blob/15df1d7be117fcd86ade3e9e7328417674401af7/amgcl/preconditioner/cpr.hpp#L310

Would you like to create a pull request with update_sprecond? And, more importantly, to see if it helps with your problems?

moyner commented 5 years ago

Hi Denis,

That sounds great. I will make an attempt at implementing update_sprecond. Since I'm calling AMGCL via MATLAB I will have to do a bit of work on my end to make the calls to the interface persistent and verify that it is working.

ddemidov commented 5 years ago

Olav, I've made some progress with block-valued CPR. Can you share a reasonably sized system matrix that works well with CPR preconditioner? I'd like to test the correctness of the changes in the block-cpr branch.

moyner commented 5 years ago

Here are two example matrices for which CPR should work well. These systems have a block size of 3 and should be ordered correctly. The first has 360 degrees of freedom (120 elements) and the second has 27000 dof (9000 elements). mtx_spe.zip

ddemidov commented 5 years ago

Here are some preliminary results for cpr preconditioner from the block-cpr branch with your spe9 matrix:

Using runtime blocks. This is the 'current' approach when matrices at both stages are assumed to have scalar values:

$ ./examples/cpr -A spe9.mtx -f spe9_rhs.mtx -p precond.sprecond.type=ilu0 -b3
CPR (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    2
Operator complexity: 2.22
Grid complexity:     1.24
Memory footprint:    11.78 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      2.22 M (45.12%)
    1         2125          73451      9.55 M (54.88%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 27000
  nonzeros: 289550
  memory:   9.68 M

Iterations: 20
Error:      8.64196e-09

[Profile:       1.252 s] (100.00%)
[ self:         0.384 s] ( 30.63%)
[  CPR:         0.511 s] ( 40.84%)
[   self:       0.355 s] ( 28.33%)
[    solve:     0.157 s] ( 12.51%)
[  reading:     0.357 s] ( 28.53%)

Using compile-time blocks. This is the new approach when the top-level (second stage) preconditioner is constructed for a block-valued matrix:

$ ./examples/cpr -A spe9.mtx -f spe9_rhs.mtx -p precond.sprecond.type=ilu0 -c3
CPR (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    2
Operator complexity: 2.22
Grid complexity:     1.24
Memory footprint:    11.78 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      2.22 M (45.12%)
    1         2125          73451      9.55 M (54.88%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 9000
  nonzeros: 60380
  memory:   9.50 M

Iterations: 11
Error:      9.8151e-09

[Profile:       1.176 s] (100.00%)
[ self:         0.368 s] ( 31.29%)
[  CPR:         0.412 s] ( 35.07%)
[   self:       0.332 s] ( 28.22%)
[    solve:     0.081 s] (  6.85%)
[  reading:     0.395 s] ( 33.64%)

In this particular case, the new approach seems to work better (has fewer iterations). Also, the setup step for the CPR is actually much simpler and easier to write. I'll update cpr_drs along the same lines and merge the commits to master.

ddemidov commented 5 years ago

The results for cpr_drs are similar:

Runtime blocks:

./examples/cpr_drs -A spe9.mtx -f spe9_rhs.mtx -p precond.sprecond.type=ilu0 -b3
CPR_DRS (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    2
Operator complexity: 2.04
Grid complexity:     1.23
Memory footprint:    10.40 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      2.09 M (49.01%)
    1         2050          62826      8.31 M (50.99%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 27000
  nonzeros: 289550
  memory:   9.68 M

Iterations: 12
Error:      6.08527e-09

[Profile:       0.671 s] (100.00%)
[  CPR:         0.307 s] ( 45.73%)
[   self:       0.217 s] ( 32.28%)
[    solve:     0.090 s] ( 13.45%)
[  reading:     0.364 s] ( 54.23%)

Compile-time blocks:

./examples/cpr_drs -A spe9.mtx -f spe9_rhs.mtx -p precond.sprecond.type=ilu0 -c3
CPR_DRS (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    2
Operator complexity: 2.04
Grid complexity:     1.23
Memory footprint:    10.40 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      2.09 M (49.01%)
    1         2050          62826      8.31 M (50.99%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 9000
  nonzeros: 60380
  memory:   9.50 M

Iterations: 6
Error:      8.60058e-09

[Profile:       0.663 s] (100.00%)
[  CPR:         0.259 s] ( 39.06%)
[   self:       0.215 s] ( 32.39%)
[    solve:     0.044 s] (  6.67%)
[  reading:     0.404 s] ( 60.91%)

I have merged the changes to the master branch.

ddemidov commented 5 years ago

While we are looking at CPR, the latest commits allow to mix not only scalar/block valued backends for P and S preconditioners, but also single/double precision valued backends.

So something like this is possible:

typedef amgcl::backend::builtin<float>  PBackend;
typedef amgcl::backend::builtin<double> SBackend;

typedef
    amgcl::amg<
        PBackend,
        amgcl::runtime::coarsening::wrapper,
        amgcl::runtime::relaxation::wrapper
        > PPrecond;

typedef
    amgcl::relaxation::as_preconditioner<
        SBackend,
        amgcl::runtime::relaxation::wrapper
        > SPrecond;

typedef
    amgcl::make_solver<
        amgcl::preconditioner::cpr_drs<PPrecond, SPrecond>,
        amgcl::runtime::solver::wrapper<SBackend>
        > Solver;

This would use single precision for the pressure preconditioner, which would require less memory and should be also faster because of reduced memory bandwidth. For example, the following change in examples/cpr_drs.cpp:

diff --git a/examples/cpr_drs.cpp b/examples/cpr_drs.cpp
index ebf8c13..deb5b02 100644
--- a/examples/cpr_drs.cpp
+++ b/examples/cpr_drs.cpp
@@ -121,10 +121,10 @@ void solve_block_cpr(const Matrix &K, const std::vector<double> &rhs, boost::pro

 #if defined(SOLVER_BACKEND_BUILTIN)
     typedef amgcl::backend::builtin<val_type>  SBackend;
-    typedef amgcl::backend::builtin<double>    PBackend;
+    typedef amgcl::backend::builtin<float>     PBackend;
 #elif defined(SOLVER_BACKEND_VEXCL)
     typedef amgcl::backend::vexcl<val_type>  SBackend;
-    typedef amgcl::backend::vexcl<double>    PBackend;
+    typedef amgcl::backend::vexcl<float>     PBackend;
 #endif

     typedef

results in the following profile for the spe9 problem:

$ ./cpr_drs -B -A spe9.bin -f spe9_rhs.bin -p precond.sprecond.type=ilu0 \
    precond.pprecond.coarse_enough=500 -c3
CPR_DRS (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    3
Operator complexity: 2.24
Grid complexity:     1.25
Memory footprint:    2.62 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      1.54 M (44.62%)
    1         2050          62826    972.85 K (46.43%)
    2          225          12107    132.46 K ( 8.95%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 9000
  nonzeros: 60380
  memory:   9.50 M

Iterations: 9
Error:      2.93972e-09

[Profile:       0.064 s] (100.00%)
[ self:         0.000 s] (  0.25%)
[  CPR:         0.060 s] ( 93.77%)
[   self:       0.002 s] (  2.85%)
[    setup:     0.024 s] ( 37.25%)
[    solve:     0.034 s] ( 53.67%)
[  reading:     0.004 s] (  5.98%)

Compare this with the original result of using double precision for both stages:

$ ./cpr_drs_ref -B -A spe9.bin -f spe9_rhs.bin -p precond.sprecond.type=ilu0 \
    precond.pprecond.coarse_enough=500 -c3
CPR_DRS (two-stage preconditioner)
### Pressure preconditioner:
Number of levels:    3
Operator complexity: 2.24
Grid complexity:     1.25
Memory footprint:    3.62 M  -- !!!

level     unknowns       nonzeros      memory
---------------------------------------------
    0         9000          60380      2.09 M (44.62%)
    1         2050          62826      1.28 M (46.43%) 
    2          225          12107    263.16 K ( 8.95%)

### Global preconditioner:
Relaxation as preconditioner
  unknowns: 9000
  nonzeros: 60380
  memory:   9.50 M

Iterations: 9
Error:      2.94145e-09

[Profile:       0.100 s] (100.00%)
[ self:         0.000 s] (  0.27%)
[  CPR:         0.093 s] ( 93.14%)
[   self:       0.002 s] (  1.98%)
[    setup:     0.051 s] ( 50.91%)  -- !!!
[    solve:     0.040 s] ( 40.25%)  -- !!!
[  reading:     0.007 s] (  6.59%)

moyner commented 5 years ago

I have implemented the interface to the block CPR in MRST and I did some preliminary testing. Just simulating a few steps of SPE9 gives me a significant improvement in runtime. Here, with tolerance of 1e-3: block_timing If I tighten the tolerance to 1e-6, the block solver pulls further ahead relative to the scalar solver. block_timing_strict I also ran a few other test cases where the previous solver ran into pivot issues and had no problems with the block solver. The general slowness of the benchmarks I report here reflect that I'm working on an ancient laptop at the moment. Once I am back in office, I will run further benchmarks on more modern hardware.

I have not tried to mix single/double precision for backends yet, but I think that makes a lot of sense since the pressure decoupling is an approximation. Thank you for the effort - I'm happy to close this issue as solved if you agree.

ddemidov commented 5 years ago

One note re spe9 test: the system is small enough for the coarse level solver setup (sparse LU decomposition) at the default coarse_enough=3000 to have considerable relative cost. If you reduce precond.pprecond.coarse_enough to 500, then on my machine the setup time goes down from 0.2s to 0.05s in case of double precision and 0.02s in case of single precision pressure preconditioner (see my results above).

ddemidov / amgcl

Possibility of block solver for CPR? #115