I do not know how to tune the amgcl to make solver faster

qzxuhui commented 3 years ago

Sir, thanks for your great job of amgcl.

I was using Eigen CG to solve the linear system, it not fast enough so I come to amgcl.

My matrix A and vector f comes from 3D solid finite element method simulation. The A is 219852x219852. It is positive definite.

I have a code to compare performance using EigenCG and using amgcl.

Solving my problem, amgcl takes 70s and EigenCG takes 40s and even Eigen CG have a better accuracy. I believe that amgcl should have a better performance. I see in other disscussions that is is always 8-10 second to established grid hierarchy and 0-4 second to solve the problem. I do not know why my situation so slow. I have no idea how to tune it.

The matrix data, vector data, code, script are attached in follow links

https://drive.google.com/drive/folders/1vnJ7fdxqG4AdLOm3EwY0Ndj5F3t-JQvC?usp=sharing

I would greatly appreciate any suggestions you may have regarding optimal parameters and combination of options to help solve it faster.

Thanks for your time sir.

code

#include <iostream>

#include <Eigen/Sparse>
#include <Eigen/Dense>
#include <Eigen/IterativeLinearSolvers>
#include <Eigen/SparseLU>

#include <amgcl/backend/eigen.hpp>
#include <amgcl/coarsening/ruge_stuben.hpp>
#include <amgcl/make_solver.hpp>
#include <amgcl/solver/bicgstab.hpp>
#include <amgcl/solver/cg.hpp>
#include <amgcl/amg.hpp>
#include <amgcl/coarsening/smoothed_aggregation.hpp>
#include <amgcl/relaxation/spai0.hpp>
#include <amgcl/profiler.hpp>

#include <vector>
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <ctime>
#include "omp.h"

using namespace Eigen;
Eigen::SparseMatrix<double, RowMajor> createSparseMatrixFromCSR(int col_size, const std::vector<int> &row_offset, const std::vector<int> &col_index, const std::vector<double> &value);
Eigen::SparseMatrix<double, RowMajor> readSparseMatrixFromFile(const std::string &filename);
Eigen::VectorXd readVectorFromFile(const std::string &filename);
void usingAMGCL();
void usingEigenCG();
int main(int argc, char *argv[])
{
    Eigen::initParallel();
    usingAMGCL();
    usingEigenCG();
    return 0;
}
void usingAMGCL()
{
    amgcl::profiler<> prof;

    // Read sparse matrix from MatrixMarket format.
    // In general this should come pre-assembled.
    const std::string A_file = "matrix.txt";
    SparseMatrix<double, RowMajor> A = readSparseMatrixFromFile(A_file);

    const std::string f_file = "prd_15.txt";
    Eigen::VectorXd f = readVectorFromFile(f_file);

    // Zero initial approximation:
    Eigen::VectorXd x = Eigen::VectorXd::Constant(A.rows(), 0.0);
    // Setup the solver:
    typedef amgcl::backend::eigen<double> Backend;
    typedef amgcl::make_solver<
        amgcl::amg<
            Backend,
            amgcl::coarsening::smoothed_aggregation,
            amgcl::relaxation::spai0>,
        amgcl::solver::bicgstab<Backend>>
        Solver;

    Solver::params prm;
    prm.solver.maxiter = 500;
    prm.solver.tol = 10e-9;
    prm.precond.coarsening.aggr.eps_strong = 0.003;

    prof.tic("setup");
    Solver solve(A, prm);
    prof.toc("setup");
    std::cout << solve << std::endl;

    // Solve the system for the given RHS:
    int iters;
    double error;
    prof.tic("solve");
    std::tie(iters, error) = solve(f, x);
    prof.toc("solve");

    std::cout << iters << " " << error << std::endl
              << prof << std::endl;
}
void usingEigenCG()
{
    double start, end;

    const std::string A_file = "matrix.txt";
    SparseMatrix<double, RowMajor> A = readSparseMatrixFromFile(A_file);

    const std::string f_file = "prd_15.txt";
    Eigen::VectorXd f = readVectorFromFile(f_file);

    Eigen::ConjugateGradient<Eigen::SparseMatrix<double, RowMajor>, Eigen::Lower | Eigen::Upper> cg_solver;
    cg_solver.setTolerance(1E-15);
    cg_solver.compute(A);

    start = omp_get_wtime();
    Eigen::VectorXd x = cg_solver.solve(f);
    end = omp_get_wtime();

    std::cout << "iteration is " << cg_solver.iterations() << std::endl;
    std::cout << "error is " << cg_solver.error() << std::endl;
    std::cout << "time in solve is          " << (double)(end - start) << " ms.\n";
}

Eigen::SparseMatrix<double, RowMajor> readSparseMatrixFromFile(const std::string &filename)
{
    std::ifstream fin;
    fin.open(filename.c_str());
    if ((bool)fin == false)
    {
        std::cout << "can not open file" << filename << std::endl;
    }
    int row_size;
    int col_size;
    int nonzeros;
    fin >> row_size;
    fin >> col_size;
    fin >> nonzeros;
    std::vector<int> row_offset_array(row_size + 1);
    for (int i = 0; i < row_offset_array.size(); i++)
    {
        fin >> row_offset_array[i];
    }
    std::vector<int> col_index_array(nonzeros);
    for (int i = 0; i < col_index_array.size(); i++)
    {
        fin >> col_index_array[i];
    }
    std::vector<double> value_array(nonzeros);
    for (int i = 0; i < value_array.size(); i++)
    {
        fin >> value_array[i];
    }
    fin.close();
    return createSparseMatrixFromCSR(col_size, row_offset_array, col_index_array, value_array);
}

Eigen::SparseMatrix<double, RowMajor> createSparseMatrixFromCSR(int col_size, const std::vector<int> &row_offset, const std::vector<int> &col_index, const std::vector<double> &value_array)
{
    const int row_size = row_offset.size() - 1;
    SparseMatrix<double, RowMajor> sparse_matrix(row_size, col_size);
    const int nonzeros = col_index.size();

    std::vector<Triplet<double>> triplet(nonzeros);
    for (int row = 0; row < row_size; row++)
    {
        const int start = row_offset[row];
        const int end = row_offset[row + 1];
        for (int offset_index = start; offset_index < end; offset_index++)
        {
            const int col = col_index[offset_index];
            const double value = value_array[offset_index];
            triplet[offset_index] = Triplet<double>(row, col, value);
        }
    }
    sparse_matrix.setFromTriplets(triplet.begin(), triplet.end());
    sparse_matrix.makeCompressed();
    return sparse_matrix;
}
Eigen::VectorXd readVectorFromFile(const std::string &filename)
{
    std::ifstream fin;
    fin.open(filename.c_str());
    if ((bool)fin == false)
    {
        std::cout << "can not open file" << filename << std::endl;
    }
    int size;
    fin >> size;
    VectorXd vector(size);
    for (int i = 0; i < size; i++)
    {
        fin >> vector(i);
    }
    fin.close();
    return vector;
}

script

set -x
eigen_include_path=/opt/Eigen/eigen-3.3.7
amgcl_include_path=/home/xuhui/amgcl-master
g++ main_2.cpp -O3 -I ${eigen_include_path} -I ${amgcl_include_path} -fopenmp -DNDEBUG
export OMP_NUM_THREADS=1;
./a.out

result

Solver
======
Type:             BiCGStab
Unknowns:         219852
Memory footprint: 0.00 B

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.01
Grid complexity:     1.01
Memory footprint:    8.51 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       219852        8989578      0.00 B (98.98%)
    1         2071          92823      8.51 M ( 1.02%)

494 9.552e-09

[Profile:    73.792 s] (100.00%)
[ self:       5.797 s] (  7.86%)
[  setup:     0.715 s] (  0.97%)
[  solve:    67.281 s] ( 91.18%)

iteration is 3049
error is 9.76464e-16
time in solve is          43.3314 ms.

ddemidov commented 3 years ago

Can you save the matrix and the RHS in the matrix market format? For example, with sparse amgcl::io::mm_write() and dense amgcl::io::mm_write()?

qzxuhui commented 3 years ago

Sorry, sir. I have updated the matrix and vector file using MatrixMarket in the link. The result is shown as follow.

xuhui@xuhui-Office:~/amgcl-master/build/examples$ ./solver --matrix /home/xuhui/Temp/matrix.mtx
./solver --matrix /home/xuhui/Temp/matrix.mtx
Solver
======
Type:             BiCGStab
Unknowns:         219852
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.25
Grid complexity:     1.08
Memory footprint:    209.15 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       219852        8989578    170.93 M (80.07%)
    1        16467        2141353     35.60 M (19.07%)
    2          777          96595      2.61 M ( 0.86%)

Iterations: 100
Error:      0.00502124

[Profile:      14.810 s] (100.00%)
[  reading:     8.414 s] ( 56.81%)
[  setup:       0.310 s] (  2.09%)
[  solve:       6.085 s] ( 41.08%)

xuhui@xuhui-Office:~/amgcl-master/build/examples$ ./solver --matrix /home/xuhui/Temp/matrix.mtx --rhs /home/xuhui/Temp/prd_15.mtx
./solver --matrix /home/xuhui/Temp/matrix.mtx --rhs /home/xuhui/Temp/prd_15.mtx
Solver
======
Type:             BiCGStab
Unknowns:         219852
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.25
Grid complexity:     1.08
Memory footprint:    209.15 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       219852        8989578    170.93 M (80.07%)
    1        16467        2141353     35.60 M (19.07%)
    2          777          96595      2.61 M ( 0.86%)

Iterations: 100
Error:      6.84824e-06

[Profile:      14.805 s] (100.00%)
[  reading:     8.374 s] ( 56.57%)
[  setup:       0.319 s] (  2.15%)
[  solve:       6.110 s] ( 41.27%)

Sir, is there any rule which can help improve performance?

ddemidov commented 3 years ago

You matrix has block-structure with 3x3 blocks, we can exploit it by using block-valued backend (amgcl::backend::builtin<amgcl::static_matrix<double,3,3>>). This improves things already:

./solver -A matrix.mtx -f prd_15.mtx -b3
Solver
======
Type:             BiCGStab
Unknowns:         73284
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.23
Grid complexity:     1.07
Memory footprint:    146.40 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        73284         998886    124.28 M (81.41%)
    1         5063         217965     19.94 M (17.76%)
    2          242          10102      2.18 M ( 0.82%)

Iterations: 93
Error:      8.19976e-09

[Profile:      13.918 s] (100.00%)
[  reading:     6.983 s] ( 50.17%)
[  setup:       0.217 s] (  1.56%)
[  solve:       6.717 s] ( 48.26%)

Further, switching to damped_jacobi for smoothing, and estimating the matrix spectral radius to improve the quality of smoothed aggregation:

./solver -A matrix.mtx -f prd_15.mtx -b3 \
    precond.relax.type=damped_jacobi \
    precond.coarsening.estimate_spectral_radius=true precond.coarsening.power_iters=5
Solver
======
Type:             BiCGStab
Unknowns:         73284
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.23
Grid complexity:     1.07
Memory footprint:    146.29 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        73284         998886    124.28 M (81.42%)
    1         5063         217965     19.96 M (17.77%)
    2          241          10019      2.04 M ( 0.82%)

Iterations: 67
Error:      8.17358e-09

[Profile:      11.650 s] (100.00%)
[  reading:     6.763 s] ( 58.05%)
[  setup:       0.253 s] (  2.17%)
[  solve:       4.633 s] ( 39.77%)

qzxuhui commented 3 years ago

Thanks for your reply sir. Among the sea of all options that I can apply, is there any basic rule I can refer to? I mean other than try the combinations of all options one by one, is there any thumb we can refer to? Any reference or links would be a great help.

ddemidov commented 3 years ago

Using block backend usually helps if you have the block-structured matrix, more often with speed of the setup and the solution than with convergence though. This means it does not improve the convergence by itself. Here it helped because the coarsening became implicitly aware of the block structure of the matrix. We can also tell amgcl that the matrix has a block structure with precond.coarsening.aggr.block_size=3 parameter:

./solver -A matrix.mtx -f prd_15.mtx \
    precond.relax.type=damped_jacobi \
    precond.coarsening.estimate_spectral_radius=1 precond.coarsening.power_iters=5 \
    precond.coarsening.aggr.block_size=3
Solver
======
Type:             BiCGStab
Unknowns:         219852
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.22
Grid complexity:     1.07
Memory footprint:    250.61 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       219852        8989578    214.08 M (81.63%)
    1        14784        1939392     34.68 M (17.61%)
    2          702          82962      1.84 M ( 0.75%)

Iterations: 82
Error:      7.75492e-09

[Profile:      15.126 s] (100.00%)
[  reading:     6.906 s] ( 45.66%)
[  setup:       0.554 s] (  3.66%)
[  solve:       7.665 s] ( 50.67%)

Note that the number of iterations only slightly larger in this case, but the setup time and the solution time are about twice slower.

Next, it usually just a matter of trying out different options for smoother and (less importantly) solver.

By the way, here is your main_2.cpp file updated to use the above options: https://gist.github.com/ddemidov/027a525d30afdab2fb2a19da22516d76

The output is:

$ g++ -o main_2 main_2.cpp -O3 -I /usr/include/eigen3/ -I ~/work/amgcl -fopenmp
$ ./main_2 
Solver
======
Type:             BiCGStab
Unknowns:         73284
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.23
Grid complexity:     1.07
Memory footprint:    146.29 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        73284         998886    124.28 M (81.42%)
    1         5063         217965     19.96 M (17.77%)
    2          241          10019      2.04 M ( 0.82%)

67 8.17358e-09

[Profile:     9.623 s] (100.00%)
[ self:       4.742 s] ( 49.28%)
[  setup:     0.256 s] (  2.66%)
[  solve:     4.625 s] ( 48.07%)

iteration is 3049
error is 9.76464e-16
time in solve is          23.3425 ms.

qzxuhui commented 3 years ago

You are so kind sir. Thanks a lot~ I now can have a better understanding now~

qzxuhui commented 3 years ago

Sir, I found that if the I using the ./solver --matrix /home/xuhui/Temp/matrix.mtx -r, the system can be solved.

./solver --matrix /home/xuhui/Temp/matrix.mtx -r
Solver
======
Type:             BiCGStab
Unknowns:         219852
Memory footprint: 11.74 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.32
Grid complexity:     1.09
Memory footprint:    222.33 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       219852        8989578    172.15 M (75.66%)
    1        18941        2740169     45.48 M (23.06%)
    2         1054         151388      4.70 M ( 1.27%)

Iterations: 100
Error:      0.0233907

[Profile:      14.700 s] (100.00%)
[  reading:     8.399 s] ( 57.14%)
[  reorder:     0.039 s] (  0.27%)
[  setup:       0.366 s] (  2.49%)
[  solve:       5.892 s] ( 40.08%)

But we combine the -b3 with -r the Segmentation fault can happen.

./solver --matrix /home/xuhui/Temp/matrix.mtx -b3 -r
./solver --matrix /home/xuhui/Temp/matrix.mtx -b3 -r
Segmentation fault (core dumped)

Do I did something wrong? options -b can not combine with -r ?

ddemidov commented 3 years ago

-r tells solver to reorder the matrix, and it rarely helps. In your case above, the solver did not converge: it just hit the default limit of 100 iterations. The error of 2e-2 is far from the default tolerance of 1e-8. Not sure why the segfault is happening (I can reproduce it too), but most probably you don't need reordering anyway.

ddemidov / amgcl