linalg: least squares - Githubissues

perazz commented 5 months ago

Least squares solution to to system $Ax=b$, i.e. such that the 2-norm $||b-Ax||$ is minimized., based on LAPACK *GELSD functions.

[x] base implementation
[x] tests
[x] exclude unsupported xdp
[x] documentation
[x] submodule
[x] examples
[x] subroutine interface

Prior art:

NumPy: lstsq(a, b, rcond='warn')
Scipy: lstsq(a, b, cond=None, overwrite_a=False, overwrite_b=False, check_finite=True, lapack_driver=None)
IMSL: Result = IMSL_QRSOL(B, [A] [, AUXQR] [, BASIS] [, /DOUBLE] [, QR] [, PIVOT] [, RESIDUAL] [, TOLERANCE])

Proposed implementation: generic interface for either 1 RHS or many RHS's

x = lstsq(a,b) -> simplest call
x = lstsq(a,b,cond,overwrite_a,rank,err) -> expert call:
- real(kind(a)), intent(in), optional :: cond = optional cutoff for rank evaluation: singular values s(i)<=cond*maxval(s) are considered 0. Default: 0.0
- logical(lk), optional, intent(in) :: overwrite_a = option to avoid internal allocation, but destroy input matrix a. Default: .false.
- integer(ilp), optional, intent(out) :: rank = return rank of A, if requested
- type(linalg_state_type), optional, intent(out) :: err = return state variable

cc: @jvdp1 @jalvesz @fortran-lang/stdlib I believe this is ready for consideration.

perazz commented 5 months ago

Thank you all @jvdp1 @gnikit @jalvesz for the comments! I believe it's polished enough now. Regarding the allocation question @jalvesz, please let me know. Because there are 3/4 allocations per case, the subroutine interface would look similar to gelsd: something like call lstsq_solve(A,b,x,rwork,iwork,cwork,rank,err) so if one wants to handle the solution fully controlling it, maybe (thinking aloud) they could just use the gelsd LAPACK interface that we also provide? Otherwise, I'd say let's open a separate issue for this, so we can discuss further and find a suitable common ground

jalvesz commented 5 months ago

Regarding the allocation question @jalvesz, please let me know. Because there are 3/4 allocations per case, the subroutine interface would look similar to gelsd: something like call lstsq_solve(A,b,x,rwork,iwork,cwork,rank,err) so if one wants to handle the solution fully controlling it, maybe (thinking aloud) they could just use the gelsd LAPACK interface that we also provide?

Indeed, full control on that regard can be done using the gelsd interface directly.

I'd say let's open a separate issue for this, so we can discuss further and find a suitable common ground

Yes, let's keep this idea in mind for latter on.

perazz commented 5 months ago

@jvdp1 @jalvesz @gnikit I added the subroutine interface. This interface is now far more complicated, so I would like to ask your help deciding if the expert user interface has now too many control knobs? We now also have to export an interface to let the user compute what's the minimum working array space for all cases.

Here is the new interfaces I've prepared:

https://github.com/fortran-lang/stdlib/blob/2e370f45438a637c90b608609f6edc0ddbdc6ef6/src/stdlib_linalg.fypp#L344-L351

https://github.com/fortran-lang/stdlib/blob/2e370f45438a637c90b608609f6edc0ddbdc6ef6/src/stdlib_linalg.fypp#L295-L321

jalvesz commented 5 months ago

I'll try to take a close look at this soon @perazz, great work!! This makes me think about the issues when one wants to solve sparse linear systems with different preconditioners, one also needs to create a work space for the preconditioner. So I think it is worth defining this notion properly now.

jvdp1 commented 4 months ago

Thank you @perazz . It sounds ready to be merged IMO. Do you agree? @jalvesz @gnikit ?

jalvesz commented 4 months ago

Yes, I think this PR is ready. The subroutine & DT discussion can be continued down the road in a new PR, which are not show stoppers IMO ;)

gnikit commented 4 months ago

LGTM, however some CI MacOS tests are failing. Do we know what is causing that?

jalvesz commented 4 months ago

Those are fails from before this merged PR https://github.com/fortran-lang/stdlib/pull/807 by @perazz, I think running the tests again would be successful.

jvdp1 commented 4 months ago

Thank you @perazz /all. I will merge this PR now, and I suggest to wait on #806 to create a new release.

perazz commented 4 months ago

Great! Thank you @jvdp1 @jalvesz @gnikit. For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

jvdp1 commented 4 months ago

For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

Which other linear algebra functions do you plan to add? Could they be grouped based on some theme?

perazz commented 4 months ago

If we look at NumPy.linalg for example, good LAPACK-based functions would be:

eigenvalues (eig, eigh, eigvals, eigvalsh)
condition number (cond)
matrix decompositions (svd, cholesky, qr)
pseudo-inverse (based on svd)

jvdp1 commented 4 months ago

We could follow the same subsections ("Matrix and vector products", "decompositions", "matrix eigenvalues", "norms", "solving equations") as in Numpy. Each section could be related with a new release. However, this would need some orders of implementation and impact your plans, which I don't want. So, as you prefer regarding releases.

gnikit commented 4 months ago

For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

For me the answer depends on how long it will take for all the linear algebra functions to be completed and the final release to be issued. My original thought was to aim for quick feedback loops from the users/community to inform future linalg API PRs. The way I see it, incremental releases for high quality, self-contained work, come at a small cost to development. This is how I would classify these last few PRs.

However, the main objective is to deliver the linear algebra API not to follow release best-practices, so I leave this entirely up to you @perazz.

perazz commented 4 months ago

Makes a lot of sense @gnikit. If you'd like to prepare a new release already @jvdp1, I'm in for that, do let me know if/what inputs do you need from my side. For example, could be 0.5.1 so the community can respond to the API on Discourse. Thanks!

fortran-lang / stdlib

linalg: least squares #801