fortran-lang / minpack

Modernized Minpack: for solving nonlinear equations and nonlinear least squares problems
https://fortran-lang.github.io/minpack/
Other
96 stars 20 forks source link

Allow passing of user data for callback procedure #30

Open awvwgk opened 2 years ago

awvwgk commented 2 years ago

Proposal to allow passing user data through minpack as unlimited polymorphic object. This approach has the advantage of mostly keeping the API unchanged, however users have to adjust their objective functions and add another argument.

This or an alternative solution is required to create a proper export of the C bindings. Opening this for discussion.

Previously discussed in

jacobwilliams commented 2 years ago

I'm not wild about making the user declare the udata argument in all their functions, even if they aren't going to use it. It's similar to how old codes had rwork, iwork array arguments that you also had to declare even if you weren't going to use them.

I haven't look at the c callback issues. What are the alternatives?

awvwgk commented 2 years ago

At least for the C API it makes sense to have a data pointer, since there is no straight-forward alternative available. The issue is that we can't do the trick with via an internal procedure to avoid global variables there, however I think I found a solution to archive the required behavior in C without affecting the Fortran side following your proposal in #4.

certik commented 2 years ago

The cleanest is still probably the internal procedure. Why cannot we use that? The lowest level code just uses a callback. Then in the upper level API we simply create an internal subroutine, that accesses the "udata", as passed through in the parent procedure. Then in the C level API we simply expose udata as a pointer. That should work, without polluting the low level API with udata.

awvwgk commented 2 years ago

The cleanest is still probably the internal procedure. Why cannot we use that?

The only practical issue might be compiler support, see https://github.com/fortran-lang/minpack/issues/14#issuecomment-1030768295. But maybe we shouldn't worry about this now and just use internal procedures.

Nicholaswogan commented 2 years ago

I really support this pull request because it helps scientific programers write better code. The nested function approach works just fine, but it isn't the obvious solution that scientific programers will make. The obvious solution is to pass data via a global module variable, or even a common block. These solutions are obvious to scientific programers because legacy code is littered with these sub-optimal solutions, and scientific programers learn fortran by struggling with their advisors legacy code (speaking from experience here!). Of course, passing data via globals like this is not ideal for a bunch of reasons.

Alternatively, if fortran-lang/minpack has root-solving functions that look like the following, as @awvwgk suggests,

subroutine hybrd1(fcn, n, x, Fvec, Tol, Info, Wa, Lwa, udata)
! ...
        class(*), intent(inout), optional :: udata !! user data
! ...
end subroutine hybrd1

then it's extremely clear to the novice scientific programer that udata is how they should pass data to their objective function. I would almost prefer the variable was named user_data instead of udata to make it even more obvious. The novice might not know how to use an unlimited polymorphic object, but this is quickly remedied when they google it and land on fortran-lang documentation.

This is the strength of Fortran over many other languages. The right way to do something is often obvious. We should play to this strength!

jacobwilliams commented 2 years ago

I think one of the goals here is to show scientific programmers how they should be doing things, not let them keep using the old bad patterns. :)

Rarely will it be convenient anyway to have to stuff everything in some unlimited polymorphic entity (which would require the dreaded and annoying select type construct). Using contained routines or object oriented classes (see #31) are better, and don't have the effect of cluttering up the low level api.

C is different...since C is a low-level language. I think it's OK to have something like that in the C API, but there should be a way to avoid having it clutter up the fortran code so much?

ivan-pi commented 2 years ago

Rarely will it be convenient anyway to have to stuff everything in some unlimited polymorphic entity (which would require the dreaded and annoying select type construct). Using contained routines or object oriented classes (see #31) are better, and don't have the effect of cluttering up the low level api.

This might be just a matter of taste, but having to write an internal procedure to match an interface, or having to wrap the callback for each solver class feels just as tedious to me. The clutter has simply shifted from the data dummy variable to the specialization needed for the solver in question.

I'll take the SLSQP optimizer as an example, the callback needed is:

    subroutine rosenbrock_func(me,x,f,c)
        class(slsqp_solver),intent(inout) :: me
        real(wp),dimension(:),intent(in)  :: x      !! optimization variable vector
        real(wp),intent(out)              :: f      !! value of the objective function
        real(wp),dimension(:),intent(out) :: c      !! the constraint vector `dimension(m)`,
                                                    !! equality constraints (if any) first.

        f = 100.0_wp*(x(2) - x(1)**2)**2 + (1.0_wp - x(1))**2  !objective function
        c(1) = 1.0_wp - x(1)**2 - x(2)**2  !equality constraint (>=0)
    end subroutine rosenbrock_func

I cannot reuse this callback with other optimization libraries unless 1) the solver instance is part of an abstract solver hierarchy, or 2) I put the core of the callback into a second low-level procedure (or god forbid, include it using the preprocessor). Moreover, if I may borrow some words

I'm not wild about making the user declare the me argument in all their functions, even if they aren't going to use it.

However, if the callback was changed to:

abstract interface
  subroutine calcfc(x,f,c,data)
    real(wp), intent(in) :: x(:)   ! optimization variables
    real(wp), intent(out) :: f     ! value of the objective function
    real(wp), intent(out) :: c(:)  ! the constraint vector
    class(*), intent(in), optional :: data
  end subroutine
end interface

both the callback function, and any necessary data can be reused throughout any solver library which matches the abstract interface. Caveat, we need a select type if we actually want to access the data.

In both cases the callback takes four arguments. In the first case, the procedure is constrained to a single solver. In the second case it is not. Arguably, it's easier to remember how to use a class(*) variable, than it is to write a callback for each solver class.

certik commented 2 years ago

And if the callback is just:

abstract interface
  subroutine calcfc(x,f,c,data)
    real(wp), intent(in) :: x(:)   ! optimization variables
    real(wp), intent(out) :: f     ! value of the objective function
    real(wp), intent(out) :: c(:)  ! the constraint vector
  end subroutine
end interface

Then it seems it's even easier to reuse...

SciPy's minimize has an args argument:

args : tuple, optional

Extra arguments passed to the objective function and its derivatives (Jacobian, Hessian).

But I think it's easier to handle in Python than in Fortran from the user's perspective.

awvwgk commented 2 years ago

SciPy's minimize has an args argument:

args : tuple, optional Extra arguments passed to the objective function and its derivatives (Jacobian, Hessian).

But I think it's easier to handle in Python than in Fortran from the user's perspective.

This is probably true, unless we want to wrap directly from Fortran to Python (via f2py or similar). Preferably we can define our Python intercompatibility via a C API, which will be beneficial for more than one language in the long run.

Fortunately, I don't think the design of the Fortran callback is actually relevant for building the C API. For any C intercompatibility we have to involve at least two layers of callback unless the callback interface is C intercompatible, so we have to have some glue in between, which will account for any design choice of the Fortran callback. Therefore, I don't see this as a blocker for defining our C API.

certik commented 2 years ago

For the C API, it seems a void* is the most natural to pass as an argument for the user defined data?

awvwgk commented 2 years ago

For the C API, it seems a void* is the most natural to pass as an argument for the user defined data?

Yes, the callback for the C API proposal in #29 defines it as

https://github.com/fortran-lang/minpack/blob/cd7bb38c24a840fd78757a1d9fe3b2a750258134/include/minpack.h#L15-L20

Depends a bit on taste whether the void* should be the first or last argument, I followed NLopt's convention to put it last. It should provide all languages interfacing via C to minpack with at least one way to pass custom data.

certik commented 2 years ago

Depends a bit on taste whether the void* should be the first or last argument

I think last is better from a performance perspective: the first arguments get passed in registers, and the user data can be empty, or not always used, so it would be a waste to pass the nullptr in a register. On the other hand, if the user data is used (say it contains a pointer to some kind of an array), then I currently don't know which way is faster (possibly it depends on the problem).

ivan-pi commented 2 years ago

I've seen some C codes use an attribute like

void*           cookie __attribute__ ((unused))

to mark an unused data argument in the callback.

codecov[bot] commented 2 years ago

Codecov Report

Merging #30 (18a7f67) into main (281e9b9) will decrease coverage by 3.97%. The diff coverage is 60.31%.

@@            Coverage Diff             @@
##             main      #30      +/-   ##
==========================================
- Coverage   88.77%   84.80%   -3.98%     
==========================================
  Files           2        3       +1     
  Lines        1221     1336     +115     
  Branches      456      468      +12     
==========================================
+ Hits         1084     1133      +49     
- Misses         40       94      +54     
- Partials       97      109      +12     
Impacted Files Coverage Δ
src/minpack_legacy.f90 46.42% <46.42%> (ø)
src/minpack_capi.f90 76.34% <66.66%> (-22.05%) :arrow_down:
src/minpack.f90 88.26% <77.77%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 281e9b9...18a7f67. Read the comment docs.

awvwgk commented 2 years ago

Updated this branch to show how we can retain the old interface, add support for user data and remove the nested function usage from the C export. It also raises the bar regarding Fortran features a compiler has to support, which can be a disadvantage.

certik commented 2 years ago

It also raises the bar regarding Fortran features a compiler has to support

It looks like it "only" requires a support for class(*) procedure arguments.

awvwgk commented 2 years ago

One notable required feature which breaks compiler support for Intel LLVM is the usage of procedure pointers. Looks like procedure pointers are now implemented with ifx 2022.0.0, so we are good.