LLNL / sundials

Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
https://computing.llnl.gov/projects/sundials
BSD 3-Clause "New" or "Revised" License
484 stars 118 forks source link

Question about KINSOL behavior #500

Open bangerth opened 1 month ago

bangerth commented 1 month ago

Background: We are trying to convert a number of places in the ASPECT project (https://github.com/geodynamics/aspect/) to use KINSOL instead of hand-rolled nonlinear solvers. ASPECT is a code for large-scale simulation of processes in the Earth based on deal.II.

In a test case for a really simple case extracted by my colleague @bobmyhill (reduced from https://github.com/geodynamics/aspect/pull/5698), we are trying to solve for a root of a one-argument function f(x) for which we get the following sequence of callbacks when using KIN_NONE (=Newton):

     Computing residual at x=-69.77069997, f(x)=-176.46700446
     Recomputing J at x=-69.77069997
     Solving for dx with rhs=176.46700446
     Computing residual at x=-69.77070101, f(x)=-176.4670055

     Computing residual at x=106.69630449, f(x)=203.06842599
     Recomputing J at x=106.69630449
     Solving for dx with rhs=-203.06842599
     Computing residual at x=106.69630608, f(x)=203.06843156

     Computing residual at x=48.676754204, f(x)=1.4210854715e-14

I've got a number of questions about this sort of log:

If it is of any help, I've annotated every one of the residual calls above with the last KINSOL functions that invoked the call to the user-supplied residual:

     Computing residual at x=-69.77069997, f(x)=-176.46700446
        from KINSolInit ()
        from KINSol ()

     Recomputing J at x=-69.77069997
     Solving for dx with rhs=176.46700446
     Computing residual at x=-69.77070101, f(x)=-176.4670055
        from kinLsDQJtimes ()
        from kinLsATimes ()
        from kinLsSolve ()
        from KINPicardFcnEval ()
        from KINPicardAA ()
        from KINSol ()

     Computing residual at x=106.69630449, f(x)=203.06842599
        from KINPicardAA ()
        from KINSol ()     
     Recomputing J at x=106.69630449
     Solving for dx with rhs=-203.06842599
     Computing residual at x=106.69630608, f(x)=203.06843156
        from kinLsDQJtimes ()
        from kinLsATimes ()
        from kinLsSolve ()
        from KINPicardFcnEval ()
        from KINPicardAA ()
        from KINSol ()

     Computing residual at x=48.676754204, f(x)=1.4210854715e-14
        from KINPicardAA ()
        from KINSol ()

I assume (but don't know) that the functions named "Picard" are also doing the Newton method?

balos1 commented 1 month ago

I assume (but don't know) that the functions named "Picard" are also doing the Newton method?

You should not be seeing KINPicardAA and KINPicardFcnEval in the trace when you are using Newton's method (KIN_NONE). Those functions are specific to the Picard iteration option (KIN_PICARD). Can you provide a trace for KIN_NONE too?

balos1 commented 1 month ago

This almost looks like it's doing a finite difference approximation, but I know that's not it because the calls happen before/after solving for dx.

The second call you are seeing is in fact for a finite difference approximation. kinLsDQJtimes performs a difference quotient for J*v.

What is the purpose for the call after solving the Newton system at a slightly perturbed point?

Your log is not quite right (logically) w.r.t. the actual control flow. Here is a better representation that might make things seem more sensible:

First Iteration:
     Computing residual at x=-69.77069997, f(x)=-176.46700446
        from KINSolInit ()
        from KINSol ()
     Recomputing J at x=-69.77069997
     Solving for dx with rhs=176.46700446
     Computing residual at x=-69.77070101, f(x)=-176.4670055
        from kinLsDQJtimes ()
        from kinLsATimes ()
        from kinLsSolve ()
        from KINPicardFcnEval ()
        from KINPicardAA ()
        from KINSol ()
     Computing residual at x=106.69630449, f(x)=203.06842599
        from KINPicardAA ()
        from KINSol ()     

Second Iteration:
     Recomputing J at x=106.69630449
     Solving for dx with rhs=-203.06842599
     Computing residual at x=106.69630608, f(x)=203.06843156
        from kinLsDQJtimes ()
        from kinLsATimes ()
        from kinLsSolve ()
        from KINPicardFcnEval ()
        from KINPicardAA ()
        from KINSol ()
     Computing residual at x=48.676754204, f(x)=1.4210854715e-14
        from KINPicardAA ()
        from KINSol ()

First, see this section in the KINSOL docs for an outline of the KINSOL Picard scheme.

For Iteration 1, the first call from KINSolInit computes F(x) to check and see if the initial guess satisfies the system. If it does not, then we proceed to solve the system and the following calls are the same as in Iteration 2.

For Iteration 2, the first call to F(x) is done for the finite difference approximation of J*v. This is needed during the solve of the Picard system g(x) = x - L^{-1}F(x) (we solve Ly = -F(x)). Finally, the third call computes F(xnew). This is used in the next iteration when we solve the linear system to compute g(x) again.

bangerth commented 1 month ago

Hi @balos1 -- many thanks for helping me through this. I have to admit that I am quite confused by what it is that I saw on Sunday because I can no longer reproduce the backtraces; I didn't change any of my codes, just recompiled everything on my system and something in there led to a change that ensured that I really do get the Newton scheme now. Anyway, that's my problem :-(

That said, here's now the backtrace:

Computing residual at x=-69.77069997, f(x)=-176.46700446
     from KINSolInit
     from KINSol
Recomputing J at x=-69.77069997
Solving for dx with residual=176.46700446
Computing residual at x=-69.77070101, f(x)=-176.4670055
     from KINLinSolDrv
     from KINSol
Computing residual at x=106.69630449, f(x)=203.06842599
     from KINFullNewton
     from KINSol

Recomputing J at x=106.69630449
Solving for dx with residual=-203.06842599
Computing residual at x=106.69630608, f(x)=203.06843156
     from KINLinSolDrv
     from KINSol
Computing residual at x=48.676754204, f(x)=1.4210854715e-14
     from KINFullNewton
     from KINSol

This backtrace looks more like a Newton method, but two questions remain for me:

cswoodward commented 1 month ago

A quick answer to your first question - if you are using Picard (and not regular fixed point) and have set things up to use the Jacobian for L then the Picard iteration is exactly Newton's method.

balos1 commented 1 month ago

I suspect the second residual computation is still coming from a finite difference approximation. KINLinSolDrv does not directly call your residual function, so there seems to be some backtrace missing.

Also, it would be helpful to know where your print statements are occurring in relation to KINSOL. I assume:

Recomputing J at x=106.69630449. --> Inside the Jacobian callback function provided to KINSOL
Solving for dx with residual=-203.06842599 --> ??? 
Computing residual at x=106.69630608, f(x)=203.06843156 --> Inside the residual callback function provided to KINSOL
bangerth commented 1 month ago

On 6/4/24 15:32, cswoodward wrote:

A quick answer to your first question - if you are using Picard (and not regular fixed point) and have set things up to use the Jacobian for L then the Picard iteration is exactly Newton's method.

Excellent, thank you -- yes, we are providing the Jacobian (https://github.com/geodynamics/aspect/pull/5698#issuecomment-2144141262), so that makes sense.

bangerth commented 1 month ago

As for the other issue: I have those print statements in the user-provided callbacks. When providing backtraces, I put a breakpoint on the line that prints the output and drop all frames not pertinent. This is the full backtrace for the first call (at x=-69.770699970381315):

#0  operator() (__closure=0x7fffffffd930, current_log_stress_ii=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770699970381315}
}, residual=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.46700445816407}
}) at /home/bangerth/p/deal.II/1/install/examples/step-77/step-77.cc:118
#1  0x0000555555579a0d in std::__invoke_impl<void, main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)>&, const dealii::Vector<double>&, dealii::Vector<double>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#2  0x00005555555794d0 in std::__invoke_r<void, main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)>&, const dealii::Vector<double>&, dealii::Vector<double>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
#3  0x0000555555578fca in std::_Function_handler<void(const dealii::Vector<double>&, dealii::Vector<double>&), main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)> >::_M_invoke(const std::_Any_data &, const dealii::Vector<double> &, dealii::Vector<double> &) (__functor=..., __args#0=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770699970381315}
}, __args#1=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.46700445816407}
}) at /usr/include/c++/11/bits/std_function.h:290
#4  0x00007fffedfaa5d1 in std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)>::operator()(dealii::Vector<double> const&, dealii::Vector<double>&) const (this=0x7fffffffd930, __args#0=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770699970381315}
}, __args#1=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.46700445816407}
}) at /usr/include/c++/11/bits/std_function.h:590
#5  0x00007ffff0ddbba0 in dealii::SUNDIALS::Utilities::call_and_possibly_capture_exception<std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)>, dealii::Vector<double> const&, dealii::Vector<double>&>(std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)> const&, std::__exception_ptr::exception_ptr&, dealii::Vector<double> const&, dealii::Vector<double>&) (f=..., eptr=...) at /home/bangerth/p/deal.II/1/dealii/include/deal.II/sundials/utilities.h:77
#6  0x00007ffff0dd405b in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve(dealii::Vector<double>&)::{lambda(_generic_N_Vector*, _generic_N_Vector*, void*)#3}::operator()(_generic_N_Vector*, _generic_N_Vector*, void*) const (__closure=0x0, yy=0x5555556e8e10, FF=0x5555556e93c0, user_data=0x7fffffffd910) at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:357
#7  0x00007ffff0dd4105 in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve(dealii::Vector<double>&)::{lambda(_generic_N_Vector*, _generic_N_Vector*, void*)#3}::_FUN(_generic_N_Vector*, _generic_N_Vector*, void*) () at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:348
#8  0x00007fffd03caafc in KINSolInit (kin_mem=0x5555556e8600) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol.c:1266
#9  0x00007fffd03c7f6e in KINSol (kinmem=0x5555556e8600, u=0x5555556e8e10, strategy_in=0, u_scale=0x5555556e88a0, f_scale=0x5555556e8b90) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol.c:475
#10 0x00007ffff0dbb5ba in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve (this=0x7fffffffd910, initial_guess_and_solution=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770699970381315}
}) at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:536
#11 0x0000555555575232 in main () at /home/bangerth/p/deal.II/1/install/examples/step-77/step-77.cc:162

Frame 9 is the KINSol routine, frame 7 is the immediate deal.II-provided callback which through a sequence of other functions ultimately ends up in the user-provided callback in frame 0. So only frames 8 and 9 are relevant to you.

The second call to the residual function (at the slightly perturbed point x=-69.770701010045755) has this backtrace:

#0  operator() (__closure=0x7fffffffd930, current_log_stress_ii=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770701010045755}
}, residual=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.4670054978285}
}) at /home/bangerth/p/deal.II/1/install/examples/step-77/step-77.cc:118
#1  0x0000555555579a0d in std::__invoke_impl<void, main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)>&, const dealii::Vector<double>&, dealii::Vector<double>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#2  0x00005555555794d0 in std::__invoke_r<void, main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)>&, const dealii::Vector<double>&, dealii::Vector<double>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
#3  0x0000555555578fca in std::_Function_handler<void(const dealii::Vector<double>&, dealii::Vector<double>&), main()::<lambda(const dealii::Vector<double>&, dealii::Vector<double>&)> >::_M_invoke(const std::_Any_data &, const dealii::Vector<double> &, dealii::Vector<double> &) (__functor=..., __args#0=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770701010045755}
}, __args#1=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.4670054978285}
}) at /usr/include/c++/11/bits/std_function.h:290
#4  0x00007fffedfaa5d1 in std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)>::operator()(dealii::Vector<double> const&, dealii::Vector<double>&) const (this=0x7fffffffd930, __args#0=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770701010045755}
}, __args#1=Vector<double>(1){
  values = AlignedVector<double>(1) = {-176.4670054978285}
}) at /usr/include/c++/11/bits/std_function.h:590
#5  0x00007ffff0ddbba0 in dealii::SUNDIALS::Utilities::call_and_possibly_capture_exception<std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)>, dealii::Vector<double> const&, dealii::Vector<double>&>(std::function<void (dealii::Vector<double> const&, dealii::Vector<double>&)> const&, std::__exception_ptr::exception_ptr&, dealii::Vector<double> const&, dealii::Vector<double>&) (f=..., eptr=...) at /home/bangerth/p/deal.II/1/dealii/include/deal.II/sundials/utilities.h:77
#6  0x00007ffff0dd405b in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve(dealii::Vector<double>&)::{lambda(_generic_N_Vector*, _generic_N_Vector*, void*)#3}::operator()(_generic_N_Vector*, _generic_N_Vector*, void*) const (__closure=0x0, yy=0x5555556e9310, FF=0x5555556e9e60, user_data=0x7fffffffd910) at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:357
#7  0x00007ffff0dd4105 in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve(dealii::Vector<double>&)::{lambda(_generic_N_Vector*, _generic_N_Vector*, void*)#3}::_FUN(_generic_N_Vector*, _generic_N_Vector*, void*) () at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:348
#8  0x00007fffd03d4680 in kinLsDQJtimes (v=0x5555556e9720, Jv=0x5555556e8fe0, u=0x5555556e8e10, new_u=0x5555556ea4b0, kinmem=0x5555556e8600) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol_ls.c:959
#9  0x00007fffd03d38da in kinLsATimes (kinmem=0x5555556e8600, v=0x5555556e9720, z=0x5555556e8fe0) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol_ls.c:611
#10 0x00007fffd03d50ed in kinLsSolve (kin_mem=0x5555556e8600, xx=0x5555556e9720, bb=0x5555556e8fe0, sJpnorm=0x5555556e8828, sFdotJp=0x5555556e8820) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol_ls.c:1265
#11 0x00007fffd03caf0f in KINLinSolDrv (kin_mem=0x5555556e8600) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol.c:1360
#12 0x00007fffd03c8211 in KINSol (kinmem=0x5555556e8600, u=0x5555556e8e10, strategy_in=0, u_scale=0x5555556e88a0, f_scale=0x5555556e8b90) at /home/bangerth/dealii-candi/tmp/unpack/sundials-5.7.0/src/kinsol/kinsol.c:546
#13 0x00007ffff0dbb5ba in dealii::SUNDIALS::KINSOL<dealii::Vector<double> >::solve (this=0x7fffffffd910, initial_guess_and_solution=Vector<double>(1){
  values = AlignedVector<double>(1) = {-69.770699970381315}
}) at /home/bangerth/p/deal.II/1/dealii/source/sundials/kinsol.cc:536
#14 0x0000555555575232 in main () at /home/bangerth/p/deal.II/1/install/examples/step-77/step-77.cc:162

Here, frame 12 is KINSol, which calls KINLinSolDrv (frame 11), which calls kinLsSolve (frame 10), which calls kinLsATimes (frame 9), which calls kinLsDQJtimes (frame 8), which calls back into deal.II space (frames 7...1), which calls the user callback (frame 0).

In other words, the call that surprises me happens from kinLsDQJtimes.

balos1 commented 1 month ago

In other words, the call that surprises me happens from kinLsDQJtimes.

kinLsDQJtimes must call the residual for to perform the difference quotient approximation to Jv, but if you are providing the Jacobian it is surprising that kinLsDQJtimes is being called.

I am suspicious that there is a bug in the interfacing to KINSOL or perhaps in KINSOL itself. Specifically, I see that you're providing a custom SUNLinearSolver in dealii with the type SUNLINEARSOLVER_MATRIX_ITERATIVE. KINSOL's KINSetLinearSolver routine does not appear to handle that linear solver type correctly leading to kinLsDQJtimes being set and thus called.

@gardner48 What do you think?

bangerth commented 1 month ago

Yea, good question. I don't think we ever really understood what to pass there. We store, apply, access, and solve with solvers ourselves. All this happens in what users provide in the solve_with_jacobian() callback we attach to ops->solve. On large problems, this is usually an iterative solver, but at the end of the day the details should not matter to KINSOL since it has no part in the process of solving these linear systems -- the only reason why we set anything at all here is because we want to be informed about the target precision we need to solve with.

Let me know if you'd like me to try anything else and I'd be happy to report back!

balos1 commented 1 month ago

@bangerth Can you try setting the matvec operation on the empty matrix object created here? It can just be a no-op.

bangerth commented 1 month ago

@balos1 I assume you mean something like this?

diff --git a/source/sundials/kinsol.cc b/source/sundials/kinsol.cc
index 21ffc7d92d..95533b632a 100644
--- a/source/sundials/kinsol.cc
+++ b/source/sundials/kinsol.cc
@@ -463,6 +463,11 @@ namespace SUNDIALS
           return SUNMATRIX_CUSTOM;
         };

+        J->ops->matvec = [](SUNMatrix, N_Vector, N_Vector) {
+          DEAL_II_ASSERT_UNREACHABLE();
+          return 0;
+        };
+
         J->ops->destroy = [](SUNMatrix A) {
           if (A->content)
             {

This works in the sense that it doesn't trigger the UNREACHABLE position (so the callback is never invoked) but it also does not change the output.

I should say, in case it matters, that I'm using SUNDIALS 5.7. If you think that would help, I would gladly update to a newer version.

balos1 commented 3 weeks ago

@bangerth I can't say I am sure it would help, but if you are able to try out the latest 6.x or 7.0.0 that would be great.

bangerth commented 3 weeks ago

@balos1 I will give this a try next week.

Let me ask this differently: I take it from your comments that you don't expect the second and fourth call to the residual, and in fact that you didn't expect KinLsDGJTimes to be called at all, right? Digging into the KINSOL code and our interfaces a bit, I think that that is because KINSOL calls KinLsDGJTimes if the jtimes signal is not set, and in that case computes the product with the Jacobian via finite differences.

In our interfaces, we (perhaps an oversight?) don't make the jtimes operation available. We only provide a function that solves with J. Should we be providing a function to do this operation? I guess that would be done via KINSetJacTimesVecFn.

balos1 commented 3 weeks ago

@bangerth Yes, I did not expect KinLSDQJTimes to be called. It appears its being called here to compute auxiliary variables for the linesearch. If you do set a jtimes operation with KINSetJacTimesVecFn, then it should use that instead.

What I am not sure about now (and what I will be talking about with the team) is if this really is desired behavior.

bangerth commented 3 weeks ago

I see. Thanks for inquiring -- I look forward to hearing back what you learn from your team then!

bangerth commented 2 weeks ago

I played some more with this and added the following code block to our KINSOL interfaces:

    if (multiply_with_jacobian)
      {
        KINSetJacTimesVecFn(
          kinsol_mem,
          [](N_Vector v,
             N_Vector Jv,
             N_Vector /*u*/,
             booleantype * /*new u*/,
             void *user_data) {

            DEAL_II_ASSERT_UNREACHABLE();

            // Receive the object that describes the linear solver and
            // unpack the pointer to the KINSOL object from which we can then
            // get the 'multiply_with_jacobian' function.
            const KINSOL<VectorType> &solver =
              *static_cast<const KINSOL<VectorType> *>(user_data);

            auto my_v  = internal::unwrap_nvector_const<VectorType>(v);
            auto my_Jv = internal::unwrap_nvector<VectorType>(Jv);

            // Call the user-provided setup function with these arguments:
            return Utilities::call_and_possibly_capture_exception(
              solver.multiply_with_jacobian,
              solver.pending_exception,
              *my_v,
              *my_Jv);
          });
      }

The intent is to set a function that would perform the Jtimes operation. Interestingly, however, this function is never actually called -- i.e., the DEAL_II_ASSERT_UNREACHABLE never triggers. I must be doing something wrong :-(