fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.09k stars 168 forks source link

Polynomial fitting #601

Open ivan-pi opened 2 years ago

ivan-pi commented 2 years ago

Motivation

Functions for low-order polynomial fitting using (weighted) least squares regression are common in many scientific libraries.

Here's an interface for the non-weighted case, inspired by the numpy version cited below:

  function polyfit(x, y, deg, rcond, rank, singular_values) result(p)
    real, intent(in) :: x(:), y(:)
    integer, intent(in) :: deg
    real, intent(in), optional :: rcond
    integer, intent(out), optional :: rank
    real, intent(out), allocatable, optional :: singular_values(:)

Prior Art

Prior art in popular scripting languages:

Prior art in Fortran (in no particular order):

Prior art in other programming languages

Additional Information

This will probably require LAPACK for the factorization of the Vandermonde matrix either using a rank-revealing QR or SVD factorization.

A function to evaluate the polynomial (and it's derivatives) using Horner's scheme (or alternatively, in some orthogonal form), should be added in parallel.

An advanced interface could consider using derived types for polynomials, similar to the numpy Polynomial sub-package.

ivan-pi commented 2 years ago

Given how many times this has been re-implemented both in Fortran and other languages, I suspect that finding an interface which meets everyone's needs won't be easy. Hopefully we can reach an 80/20 compromise.

Here are some numbered items to help guide the discussion:

  1. subroutine vs function
  2. automatic vs assumed-shape arrays for the x and y arguments
  3. weighted vs unweighted regression
  4. manual vs automatic centering and scaling for better conditioning of the numerical problem
  5. Taylor (1, x, x**2...) vs orthogonal polynomial (Chebyshev) form
  6. statistics summary, goodness of fit (standard deviation, R^2, variance-covariance matrix...)
  7. control over the factorization method (and inspection of the factorized matrix, condition number, singular values, rank)
  8. recovery from input errors or an ill-conditioned (singular) matrix, etc.
  9. a method to fix the intercept y = p(0).
Beliavsky commented 2 years ago

Regarding the interface, a function with intent(out) arguments should be a subroutine IMO.

Often the user will want to fit successively higher polynomial orders and choose the best order using an information criterion such as AIC. Ideally such fits would be done efficiently (there are special algorithms for successively adding predictors).

Least-squares regression by successively adding general basis functions (here they are powers) is sometimes done.

ivan-pi commented 1 year ago

How about this as a simple interface?

program polyfit_example
implicit none
interface
    subroutine polyfit(x,y,n,p) bind(c,name="c_arma_polyfit")
        use iso_c_binding, only: c_double, c_int
        real(c_double), intent(in), contiguous :: x(:)
        real(c_double), intent(in), contiguous :: y(:)
        integer(c_int), intent(in), value :: n
        real(c_double), intent(out) :: p(:)
    end subroutine
end interface

integer, parameter :: dp = kind(1.0d0)

real(dp) :: x(10), y(10)
real(dp) :: p(3)
integer :: i

x = [((i-1)*0.2_dp,i=1,10)]
y = x**2 - 2.0_dp*x + 3.0_dp

call polyfit(x,y,2,p)

print *, "coefficients = ", p
print *, all(abs(p - [1.0_dp,-2.0_dp,3.0_dp]) < 100*epsilon(1.0_dp))

end program

For the implementation I just wrote a little wrapper of the polyfit function in the Armadillo C++ library:

Click for details ```cpp // c_arma_polyfit.cpp #include #include #include template C as_(CFI_cdesc_t *x); template<> auto as_>(CFI_cdesc_t *x) -> arma::Col { double* ptr = static_cast(x->base_addr); auto nx = static_cast(x->dim[0].extent); arma::Col x_{ ptr, nx, false, /* copy_aux_mem */ true /* strict */ }; return x_; } extern "C" void c_arma_polyfit(CFI_cdesc_t *fx, CFI_cdesc_t *fy, int n, CFI_cdesc_t *fp) { const auto x = as_>(fx); const auto y = as_>(fy); assert(x.n_elem == y.n_elem); auto p = as_>(fp); assert(p.n_elem == arma::uword(n+1)); [[maybe_unused]] bool success = arma::polyfit(p,x,y,n); } ``` ```txt # Makefile FC=gfortran-12 CXX=g++-12 FCFLAGS=-Wall CXXFLAGS=-Wall ARMA_INC=-I/usr/local/Cellar/armadillo/12.0.1/include ARMA_LDFLAGS=-L/usr/local/Cellar/armadillo/12.0.1/lib ARMA_LDLIBS=-larmadillo -lstdc++ polyfit_example: polyfit_example.f90 c_arma_polyfit.o $(FC) $(FCFLAGS) $(ARMA_LDFLAGS) -o $@ $^ $(ARMA_LDLIBS) c_arma_polyfit.o: c_arma_polyfit.cpp $(CXX) $(CXXFLAGS) $(ARMA_INC) -c $< .PHONY: clean clean: rm -f *.o rm -f polyfit_example ```

Here's an alternative implementation using Eigen:

Click for details ```cpp // c_eigen_polyfit.cpp // #include #include #include template Eigen::Map as_(CFI_cdesc_t *x); template<> auto as_(CFI_cdesc_t *x) -> Eigen::Map { double* ptr = static_cast(x->base_addr); auto nx = static_cast(x->dim[0].extent); Eigen::Map x_{ ptr, nx, 1}; return x_; } // Implementation based upon // https://gist.github.com/ksjgh/4d5050d0e9afc5fdb5908734335138d0 // Eigen::VectorXd polyfit(const Eigen::VectorXd& x, const Eigen::VectorXd& y, int n) { assert(x.size() == y.size()); assert(1 <= n && n <= x.size() - 1); Eigen::MatrixXd A(x.size(), n+1); // Calculate Vandermonde matrix of shape m-by-(n+1) // // x1^n x1^n-1 ... 1 // x2^n x2^n-1 ... 1 // : : // xm^n xm^n-1 ... 1 // A.col(n).setOnes(); for (int i = n; i > 0; i--) { A.col(i-1) = A.col(i).array() * x.array(); } auto QR = A.householderQr(); auto p = QR.solve(y); return p; } extern "C" void c_eigen_polyfit(CFI_cdesc_t *fx, CFI_cdesc_t *fy, int n, CFI_cdesc_t *fp) { const auto x = as_(fx); const auto y = as_(fy); assert(x.size() == y.size()); auto p = as_(fp); assert(p.size() == Eigen::Index(n+1)); p = polyfit(x,y,n); } ``` To compile use: `g++ -I/usr/local/include/eigen3 -c c_eigen_polyfit.cpp`.
ivan-pi commented 1 year ago

Often the user will want to fit successively higher polynomial orders and choose the best order using an information criterion such as AIC. Ideally such fits would be done efficiently (there are special algorithms for successively adding predictors).

Least-squares regression by successively adding general basis functions (here they are powers) is sometimes done.

This sounds like something the qrupdate library could be used for. I believe this library is used in the Octave qrupdate routine. It was written by Jaroslav Hajek, a computing expert from the Aeronautical Research and Test Institute (VZLU) in Prague. A GitLab mirror for qrupdate can be found here.

ivan-pi commented 1 year ago

A prototype Fortran implementation I wrote 3 years ago can be found here: https://gist.github.com/ivan-pi/0fd517048c415ceca441eac626bf24c9

ivan-pi commented 1 year ago

cc @arjenmarkus, @nshaffer, @jvdp1

We need a few more voices to move this issue forward. Given how many polynomial fitting codes are out there, it looks like polynomial regression is quite popular among practitioners.

Do you prefer the simple interface or the complex one?

    ! simple
    subroutine polyfit(x,y,order,p)
        real(wp), intent(in) :: x(:), y(:)
        integer(ip), intent(in) :: order
        real(wp), intent(out) :: p(:)
    end subroutine

    ! complex
   subroutine polyfit(x, y, order, p, rcond, rank, singular_values)
      real(wp), intent(in) :: x(:), y(:)
      integer(ip), intent(in) :: order
      real(wp), intent(out) :: p(:)
      real(wp), intent(in), optional :: rcond
      integer(wp), intent(out), optional :: rank
      real(wp), intent(out), allocatable, optional :: singular_values(:)
   end subroutine
arjenmarkus commented 1 year ago

Let me read the mail thread (again) and I will get back to your question tomorrow.

Op wo 1 mrt 2023 om 14:44 schreef Ivan Pribec @.***>:

cc @arjenmarkus https://github.com/arjenmarkus, @nshaffer https://github.com/nshaffer, @jvdp1 https://github.com/jvdp1

We need a few more voices to move this issue forward. Given how many polynomial fitting codes are out there, it looks like polynomial regression is quite popular among practitioners.

Do you prefer the simple interface or the complex one?

! simple
subroutine polyfit(x,y,n,p)
    real(wp), intent(in) :: x(:)
    real(wp), intent(in) :: y(:)
    integer(ip), intent(in) :: n
    real(wp), intent(out) :: p(:)
end subroutine

! complex

subroutine polyfit(x, y, deg, p, rcond, rank, singular_values) real(wp), intent(in) :: x(:), y(:) integer(ip), intent(in) :: deg real(wp), intent(out) :: p(:) real(wp), intent(in), optional :: rcond integer(wp), intent(out), optional :: rank real(wp), intent(out), allocatable, optional :: singular_values(:) end subroutine

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/601#issuecomment-1450175284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZGBXTZDOB2D7LKORLWZ5HENANCNFSM5KDTWO2A . You are receiving this because you were mentioned.Message ID: @.***>

arjenmarkus commented 1 year ago

I would opt for the extended interface: it provides just that more output that might otherwise be forgotten about. For instance, the possibility of getting back the condition number means that people are aware that such a thing as the condition number is of importance in fitting problems, even if they will not use it. By the way, I did not quite understand the argument "singular_values" - does this have to do with the SVD?

Op wo 1 mrt 2023 om 16:17 schreef Arjen Markus @.***>:

Let me read the mail thread (again) and I will get back to your question tomorrow.

Op wo 1 mrt 2023 om 14:44 schreef Ivan Pribec @.***>:

cc @arjenmarkus https://github.com/arjenmarkus, @nshaffer https://github.com/nshaffer, @jvdp1 https://github.com/jvdp1

We need a few more voices to move this issue forward. Given how many polynomial fitting codes are out there, it looks like polynomial regression is quite popular among practitioners.

Do you prefer the simple interface or the complex one?

! simple
subroutine polyfit(x,y,n,p)
    real(wp), intent(in) :: x(:)
    real(wp), intent(in) :: y(:)
    integer(ip), intent(in) :: n
    real(wp), intent(out) :: p(:)
end subroutine

! complex

subroutine polyfit(x, y, deg, p, rcond, rank, singular_values) real(wp), intent(in) :: x(:), y(:) integer(ip), intent(in) :: deg real(wp), intent(out) :: p(:) real(wp), intent(in), optional :: rcond integer(wp), intent(out), optional :: rank real(wp), intent(out), allocatable, optional :: singular_values(:) end subroutine

— Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/601#issuecomment-1450175284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZGBXTZDOB2D7LKORLWZ5HENANCNFSM5KDTWO2A . You are receiving this because you were mentioned.Message ID: @.***>