ddemidov / amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method
http://amgcl.readthedocs.org/
MIT License
722 stars 110 forks source link

Get NaN on floating point precision solver #275

Open MakotoChiba opened 3 months ago

MakotoChiba commented 3 months ago

When I use exact same RHS and Matrix, amgcl gives same result of course, but if I use floating point solver, I get NaN result some time even if I use exact same data. This problem is not happen on double precision solver. I tried using the CPU solver and the CUDA solver, and while they usually return the same answer, both return NaN once in a few dozen times. And, in that case, always generate NaN in the 1 iteration. Once NaN is producing, repeating the For loop with same data of the solver always returns NaN, so it seems to be an initialization or some other issue, maybe.

Below is the setup I use. cuda:

typedef amgcl::make_solver<
    amgcl::relaxation::as_preconditioner<
        Backend,
        amgcl::relaxation::damped_jacobi
        >,
    amgcl::solver::cg<Backend>
    > Solver;

CPU1:

typedef amgcl::make_solver<
    amgcl::amg<
        Backend,
        amgcl::coarsening::smoothed_aggregation,
        amgcl::relaxation::damped_jacobi
        >,
    amgcl::solver::cg<Backend>
    > Solver;

CPU2:

typedef amgcl::make_solver<
    amgcl::relaxation::as_preconditioner<
        Backend,
        amgcl::relaxation::damped_jacobi
    >,
    amgcl::solver::cg<Backend>
    > Solver;

thanks

ddemidov commented 3 months ago

This sounds like a badly conditioned matrix, which could mean you do need the double precision.

Do you get nans both with CPU1 and CPU2? In case the solution fails only when preconditioned with amg (CPU1), and works with CPU2, you could try to use relexation as the coarsest level solver instead of SparseLU (see amg.params.direct_coarse):

prm.precond.direct_solver = false;
MakotoChiba commented 3 months ago

Thank you for your advice. However both CPU1 and CPU2 have same issue and also GPU.

What is very strange is that sometimes the problem appears and sometimes it does not, even though it is the same matrix and rhs. In particular, it is very unlikely that it would happen on a CPU.

In practice, on the CPU, there is mostly no problem because there is almost same speed between float and double, but on the GPU, the difference in memory transfer time makes a big difference.