Use LAPACK for the QR factorization

ivan-pi commented 2 years ago

Currently the Levenberg-Marquardt routine in MINPACK calls a "home-brewed" QR factorization derived from LINPACK:

https://github.com/jacobwilliams/minpack/blob/b4e671d1692b6f9eb048ba920e97f8a4eda195e3/src/minpack.f90#L2578

This is understandable since MINPACK (released in 1980) precedes LAPACK (released in 1992) by more than a decade. To bring minpack into the 21st century it would be nice to support the LAPACK factorization routines, and only use the original ones as a fallback.

I've looked into this before, but unfortunately I lack the domain knowledge needed to do the modernization. I also had the feeling, minpack does a row by row factorization to somehow reduce storage costs. This might also have to do with rank-defective Jacobians. I see LAPACK comes in two flavors exactly for this purpose:

QR Factorization
QR Factorization with Column Pivoting (useful when a matrix is not full rank)

An overview of the orthogonal factorization routines in LAPACK is also provided in the Intel oneAPI MKL documentation

Orthogonal Factorizations: LAPACK Computational Routines

Given the reliability and trustworthiness of minpack routines I think a plan is needed how to perform the refactoring without introducing bugs and guaranteeing the accuracy of the results is maintained (or even improved).

jacobwilliams commented 2 years ago

Note: see also https://github.com/jacobwilliams/nlesolver-fortran, a basic solver that does use Lapack.

xecej4 commented 2 years ago

Replacing the QR routines in Minpack is not as easy as it may first seem, because the factorization and solution are not performed as indivisible tasks. In fact, you may notice the presence of routines RWUPDT, R1MPYQ and R1UPDT, which are used to perform rank-1 updates in an economical way. If the candidate replacement routine(s) from, say, Lapack, do not allow such rank-1 updates, and full updates are used instead, efficiency could suffer.

ivan-pi commented 2 years ago

Upon inspection rwupdt is only called in lmstr, which is the minimal storage driver for Levenberg-Marquadt. Given how abundant memory is today, I wonder how many users of this routine remain.

The routines r1mpyq and r1updt are only called in two places, once in hybrd:

                    ! compute the qr factorization of the updated jacobian.

                    call r1updt(n, n, r, Lr, Wa1, Wa2, Wa3, sing)
                    call r1mpyq(n, n, Fjac, Ldfjac, Wa2, Wa3)
                    call r1mpyq(1, n, Qtf, 1, Wa2, Wa3)

and again in hybrj:

                    ! compute the qr factorization of the updated jacobian.

                    call r1updt(n, n, r, Lr, Wa1, Wa2, Wa3, sing)
                    call r1mpyq(n, n, Fjac, Ldfjac, Wa2, Wa3)
                    call r1mpyq(1, n, Qtf, 1, Wa2, Wa3)

awvwgk commented 2 years ago

Sorry, accidental close by merged the two minpack repos.

cbouilla commented 2 years ago

Dear all, I have been investigating the same issues (but I am a C programmer). In any case, you should probably take a look at https://github.com/cbouilla/minpack-1.1

Short versions: using LAPACK yields a x4-10 speedup on large problems. But be careful: it changes the output, so you have to decide whether the result is precise enough or not.

ivan-pi commented 2 years ago

Thanks for the information. As long as the results have the same or better error tolerances, I don't think this should bother us. Is it okay (in terms of licenses) if we take a peek at your C routines, and port the changes directly into the Fortran version maintained here?

It's unfortunate that the factorization format of the MINPACK QR routine is slightly different to LAPACK so the replacement is not as simple as a cut and paste. Instead you actually need to reverse-engineer the operations which are performed. I think the refactoring will be easier for the lm*** routines, the hybr* are tougher because they reference the rank-1 update routines. It's good to know that it can be done and yields good results.

There are a few more QR factorization routines, which I'd be interested to see supported in the future:

qr_mumps, parallel-multithreaded QR factorization
Intel MKL Sparse QR Solver, Intel's sparse QR routines
SPQR, QR from SuiteSparse
BAND_QR, Banded QR routines (see #1)

This will however need bigger changes to also support different storage modes of the Jacobian matrices (dense, banded, sparse), not to mention interfacing with third-party libraries.

cbouilla commented 2 years ago

Is it okay (in terms of licenses) if we take a peek at your C routines, and port the changes directly into the Fortran version maintained here?

Sure, go ahead!

The issue you point out with the format of QR factorization is not that bad. The main difference is that the old qrfac routine stores the diagonal of R in a separate array rdiag (and stores the scalar factors of householder transformations in fjac while LAPACK does the opposite.

In lmdif/lmder, what happens is:

The QR factorization is computed
(Q transpose)*fvec is computed
R is read

The trick is that LAPACK also has a routine for the second step. The third step is actually simpler, because we don't have to watch out for the diagonal.

In hybrd/j, Q is explicitly constructed. But LAPACK also has a routine for this (see my C code).

fortran-lang / minpack

Use LAPACK for the QR factorization #2