JuliaLinearAlgebra / IterativeSolvers.jl

Iterative algorithms for solving linear systems, eigensystems, and singular value problems
MIT License
403 stars 106 forks source link

Confused about stopping criterion in cg! #292

Closed mancolric closed 3 years ago

mancolric commented 3 years ago

Hello,

Sorry if this not the correct place to post questions. I feel confused about the stopping criterion in cg!.

1) The help of cg! says the method stops when |r_k| / |r_0| ≤ max(reltol * resnorm, abstol). What is resnorm, |r_k| or |r_0|? Apart of that, I do not think that makes sense. Imagine, for example, that our first iteration is very close to the solution, i.e., r_0=1e-7. If we use a stopping criteria of the form |r_k| / |r_0| < some tolerance, then |r_K| may be too small for the machine precision. It would make more sense to me to employ something like |r_K| < reltol |b| + abstol, which does not depend on the initial condition. Note that Matlab uses this criterion with abstol=0.

2) In issue #280, it is said that the stopping criterion is in reality norm(residual) <= max(abstol, reltol * initial_residual_norm). However, the problem would be the same than before: a small initial residual may lead to a too small tolerance. For me, since the solution is (theoretically) the same irrespective of the initial condition, then the stopping criterion should be the independent as well of the latter. It does not make sense to me to stop the method at some point if the initial condition is x_1 and to stop it at another different point if the initial condition is x_2 \neq x_1.

3) Having a quick look at the code, it seems to me that the stopping criterion is actually norm(residual) <= max(abstol, reltol * initial_residual_norm), but I'm not really sure.

I would appreciate very much if you let me know which is the current stopping criterion for CG, as well as if you borrowed it from a specific reference or why you use it.

Apart of that, I suggest to change the stopping criterion to |r_K| < reltol |b| + abstol for the reasons discussed above and/or |\Delta x| < reltol |x| + abstol (which seems to be a more robust option to me, as it provides more control over the variable we are trying to find).

haampie commented 3 years ago

Sorry if this not the correct place to post questions.

This is a good place :)

Yeah, that must be a typo, the criterion used to be relative only, so ||r_k|| / ||r_0|| < tol, and then it got changed to supporting absolute tolerance as well, so ||r_k|| < max(abstol, reltol * ||r_0||), but probably the / ||r_0|| was not removed from the docstring when that change was made.

However, the problem would be the same than before: a small initial residual may lead to a too small tolerance.

Not sure what you mean, it's controlled by max(something absolute and potentially large, something relative and potentially small)

I don't see why you would want to use ||r_k|| < abstol + reltol * ||r_0|| over ||r_k|| < max(abstol, reltol * ||r_0||)

mancolric commented 3 years ago

Thanks @haampie.

For me using ||r_k|| < abstol + reltol * ||r_0|| or ||r_k|| < max(abstol, reltol * ||r_0||) is more or less the same numerically (I wrote the former when I was thinking in the latter). My concern is about the term reltol * ||r_0||. It would make more sense to me to use reltol * ||b||, because (assume abstol=0)

1) With reltol * ||r_0||, you may have more or less accuracy on your final solution depending on the initial condition. With reltol * ||b||, the expected accuracy should be the same.

2) If the initial condition is very close to the final one (may happen if the system is being solved inside a nonlinear fixed-point iteration, e.g.), then reltol * ||r_0|| will be very small (e.g., 1e-16) and the method will need a lot of iterations to converge, despite the initial condition is already very good. It may also happen that the method never converges to that tolerances with ill-conditioned and/or very large matrices. In contrast, using reltol * ||b|| should make the method to stop in a few iterations, as I would expect.

Please try the following MWE:

function TestIterativeSolvers(N::Int, RelTol::Float64)

    #Random matrix:
    B       = sprand(N, N, 0.2)
    A       = sparse(1:N, 1:N, Float64.(1:N)) + (N/2)*transpose(B)*B

    #Random solution:
    xsol    = rand(N)
    b       = A*xsol

    #-------------------------------------------
    #Solve with random initial guess:

    println("Random initial guess")

    #Initial guess:
    x0      = rand(N)
    r0_norm = norm(A*x0-b)
    b_norm  = norm(b)
    println("|r_0|=$(r0_norm)")
    println("|b|=$(b_norm)")

    #Solve with |r_k| < tol * |r_0|:
    xv              = copy(x0)
    solver_output   = cg!(xv, A, b, abstol=0.0, reltol=RelTol, log=true, verbose=false)
    println("nIters=$(solver_output[2].iters), |r|=$(solver_output[2][:resnorm][end]), |x-xsol|/|xsol|=$(norm(xv-xsol)/norm(xsol))")

    #Solve with |r_k| < tol * |b|:
    xv              .= x0
    solver_output   = cg!(xv, A, b, abstol=0.0, reltol=RelTol*b_norm/r0_norm, log=true, verbose=false)
    println("nIters=$(solver_output[2].iters), |r|=$(solver_output[2][:resnorm][end]), |x-xsol|/|xsol|=$(norm(xv-xsol)/norm(xsol))")

    #-------------------------------------------
    #Solve again with initial guess very close to the solution:

    println("\nInitial guess close to the exact solution")

    #Initial guess:
    x0      = xsol + 100*RelTol * rand(N)
    r0_norm = norm(A*x0-b)
    b_norm  = norm(b)
    println("|r_0|=$(r0_norm)")
    println("|b|=$(b_norm)")

    #Solve with |r_k| < tol * |r_0|:
    xv              = copy(x0)
    solver_output   = cg!(xv, A, b, abstol=0.0, reltol=RelTol, log=true, verbose=false)
    println("nIters=$(solver_output[2].iters), |r|=$(solver_output[2][:resnorm][end]), |x-xsol|/|xsol|=$(norm(xv-xsol)/norm(xsol))")

    #Solve with |r_k| < tol * |b|:
    xv              .= x0
    solver_output   = cg!(xv, A, b, abstol=0.0, reltol=RelTol*b_norm/r0_norm, log=true, verbose=false)
    println("nIters=$(solver_output[2].iters), |r|=$(solver_output[2][:resnorm][end]), |x-xsol|/|xsol|=$(norm(xv-xsol)/norm(xsol))")

end

Then

julia> TestIterativeSolvers(5000, 1e-10)
Random initial guess
|r_0|=9.01977918157597e7
|b|=2.22888144955814e10
nIters=375, |r|=0.00874023139937135, |x-xsol|/|xsol|=1.1989921080773163e-8
nIters=258, |r|=2.1205394486976847, |x-xsol|/|xsol|=2.724872584351992e-6

Initial guess close to the exact solution
|r_0|=220.89982926376743
|b|=2.22888144955814e10
nIters=253, |r|=2.1058125468955393e-8, |x-xsol|/|xsol|=1.0299092193761744e-10
nIters=1, |r|=0.2111130008760191, |x-xsol|/|xsol|=4.956472028094513e-9

As you can see, when the initial condition is random, both choices take a similar number of iterations. However, when the initial condition is very close to the exact one, one should expect the method to need less iterations to converge, but the first choice provides a similar number of iterations. This is because the stopping tolerance is now much smaller.

In my opinion, it would be more robust to set |Delta x| < max(reltol * |x|, abstol) or similar, as it would allow a more direct control of the error in the solution.

haampie commented 3 years ago

I think the example shows that your use case can be supported already, right?

You might want to use:

cg!(xv, A, b, abstol=RelTol*norm(b), reltol=0.0, log=true, verbose=false)

to make it more readable.

We just define reltol to be the relative reduction in the residual norm, and anything else can simply use abstol.

If you need even more control, you can use CG as an iterator and implement your own stopping condition.

|Delta x| < max(reltol * |x|, abstol)

This would return early when the method nearly stagnates, and also it's more expensive to compute (delta x is available, but it requires an additional dot product), so unlikely to become a default.

mancolric commented 3 years ago

I think the example shows that your use case can be supported already, right?

Yes.

You might want to use:

...

to make it more readable.

Yes.

you can use CG as an iterator

I will have a look. Is Delta x the field u of the iterator?

haampie commented 3 years ago

I will have a look. Is Delta x the field u of the iterator?

Yes, the common pattern in these methods is there is an update to x like this:

x_{k+1} = x_k + u

And then the residual is update like this:

r{k+1} = b - A x{k+1} = b - A x_k - A u = r_k - A u

where A u is already available from the previous bits.

mancolric commented 3 years ago

Ok. Thank you very much for your help @haampie