Open awvwgk opened 2 years ago
I'm not wild about making the user declare the udata
argument in all their functions, even if they aren't going to use it. It's similar to how old codes had rwork
, iwork
array arguments that you also had to declare even if you weren't going to use them.
I haven't look at the c callback issues. What are the alternatives?
At least for the C API it makes sense to have a data pointer, since there is no straight-forward alternative available. The issue is that we can't do the trick with via an internal procedure to avoid global variables there, however I think I found a solution to archive the required behavior in C without affecting the Fortran side following your proposal in #4.
The cleanest is still probably the internal procedure. Why cannot we use that? The lowest level code just uses a callback. Then in the upper level API we simply create an internal subroutine, that accesses the "udata", as passed through in the parent procedure. Then in the C level API we simply expose udata as a pointer. That should work, without polluting the low level API with udata
.
The cleanest is still probably the internal procedure. Why cannot we use that?
The only practical issue might be compiler support, see https://github.com/fortran-lang/minpack/issues/14#issuecomment-1030768295. But maybe we shouldn't worry about this now and just use internal procedures.
I really support this pull request because it helps scientific programers write better code. The nested function approach works just fine, but it isn't the obvious solution that scientific programers will make. The obvious solution is to pass data via a global module variable, or even a common block. These solutions are obvious to scientific programers because legacy code is littered with these sub-optimal solutions, and scientific programers learn fortran by struggling with their advisors legacy code (speaking from experience here!). Of course, passing data via globals like this is not ideal for a bunch of reasons.
Alternatively, if fortran-lang/minpack
has root-solving functions that look like the following, as @awvwgk suggests,
subroutine hybrd1(fcn, n, x, Fvec, Tol, Info, Wa, Lwa, udata)
! ...
class(*), intent(inout), optional :: udata !! user data
! ...
end subroutine hybrd1
then it's extremely clear to the novice scientific programer that udata
is how they should pass data to their objective function. I would almost prefer the variable was named user_data
instead of udata
to make it even more obvious. The novice might not know how to use an unlimited polymorphic object, but this is quickly remedied when they google it and land on fortran-lang documentation.
This is the strength of Fortran over many other languages. The right way to do something is often obvious. We should play to this strength!
I think one of the goals here is to show scientific programmers how they should be doing things, not let them keep using the old bad patterns. :)
Rarely will it be convenient anyway to have to stuff everything in some unlimited polymorphic entity (which would require the dreaded and annoying select type
construct). Using contained routines or object oriented classes (see #31) are better, and don't have the effect of cluttering up the low level api.
C is different...since C is a low-level language. I think it's OK to have something like that in the C API, but there should be a way to avoid having it clutter up the fortran code so much?
Rarely will it be convenient anyway to have to stuff everything in some unlimited polymorphic entity (which would require the dreaded and annoying
select type
construct). Using contained routines or object oriented classes (see #31) are better, and don't have the effect of cluttering up the low level api.
This might be just a matter of taste, but having to write an internal procedure to match an interface, or having to wrap the callback for each solver class feels just as tedious to me. The clutter has simply shifted from the data dummy variable to the specialization needed for the solver in question.
I'll take the SLSQP optimizer as an example, the callback needed is:
subroutine rosenbrock_func(me,x,f,c)
class(slsqp_solver),intent(inout) :: me
real(wp),dimension(:),intent(in) :: x !! optimization variable vector
real(wp),intent(out) :: f !! value of the objective function
real(wp),dimension(:),intent(out) :: c !! the constraint vector `dimension(m)`,
!! equality constraints (if any) first.
f = 100.0_wp*(x(2) - x(1)**2)**2 + (1.0_wp - x(1))**2 !objective function
c(1) = 1.0_wp - x(1)**2 - x(2)**2 !equality constraint (>=0)
end subroutine rosenbrock_func
I cannot reuse this callback with other optimization libraries unless 1) the solver instance is part of an abstract solver hierarchy, or 2) I put the core of the callback into a second low-level procedure (or god forbid, include it using the preprocessor). Moreover, if I may borrow some words
I'm not wild about making the user declare the
me
argument in all their functions, even if they aren't going to use it.
However, if the callback was changed to:
abstract interface
subroutine calcfc(x,f,c,data)
real(wp), intent(in) :: x(:) ! optimization variables
real(wp), intent(out) :: f ! value of the objective function
real(wp), intent(out) :: c(:) ! the constraint vector
class(*), intent(in), optional :: data
end subroutine
end interface
both the callback function, and any necessary data can be reused throughout any solver library which matches the abstract interface. Caveat, we need a select type
if we actually want to access the data.
In both cases the callback takes four arguments. In the first case, the procedure is constrained to a single solver. In the second case it is not. Arguably, it's easier to remember how to use a class(*)
variable, than it is to write a callback for each solver class.
And if the callback is just:
abstract interface
subroutine calcfc(x,f,c,data)
real(wp), intent(in) :: x(:) ! optimization variables
real(wp), intent(out) :: f ! value of the objective function
real(wp), intent(out) :: c(:) ! the constraint vector
end subroutine
end interface
Then it seems it's even easier to reuse...
SciPy's minimize has an args
argument:
args : tuple, optional
Extra arguments passed to the objective function and its derivatives (Jacobian, Hessian).
But I think it's easier to handle in Python than in Fortran from the user's perspective.
SciPy's minimize has an
args
argument:args : tuple, optional Extra arguments passed to the objective function and its derivatives (Jacobian, Hessian).
But I think it's easier to handle in Python than in Fortran from the user's perspective.
This is probably true, unless we want to wrap directly from Fortran to Python (via f2py or similar). Preferably we can define our Python intercompatibility via a C API, which will be beneficial for more than one language in the long run.
Fortunately, I don't think the design of the Fortran callback is actually relevant for building the C API. For any C intercompatibility we have to involve at least two layers of callback unless the callback interface is C intercompatible, so we have to have some glue in between, which will account for any design choice of the Fortran callback. Therefore, I don't see this as a blocker for defining our C API.
For the C API, it seems a void*
is the most natural to pass as an argument for the user defined data?
For the C API, it seems a
void*
is the most natural to pass as an argument for the user defined data?
Yes, the callback for the C API proposal in #29 defines it as
Depends a bit on taste whether the void*
should be the first or last argument, I followed NLopt's convention to put it last. It should provide all languages interfacing via C to minpack with at least one way to pass custom data.
Depends a bit on taste whether the
void*
should be the first or last argument
I think last is better from a performance perspective: the first arguments get passed in registers, and the user data can be empty, or not always used, so it would be a waste to pass the nullptr
in a register. On the other hand, if the user data is used (say it contains a pointer to some kind of an array), then I currently don't know which way is faster (possibly it depends on the problem).
I've seen some C codes use an attribute like
void* cookie __attribute__ ((unused))
to mark an unused data argument in the callback.
Merging #30 (18a7f67) into main (281e9b9) will decrease coverage by
3.97%
. The diff coverage is60.31%
.
@@ Coverage Diff @@
## main #30 +/- ##
==========================================
- Coverage 88.77% 84.80% -3.98%
==========================================
Files 2 3 +1
Lines 1221 1336 +115
Branches 456 468 +12
==========================================
+ Hits 1084 1133 +49
- Misses 40 94 +54
- Partials 97 109 +12
Impacted Files | Coverage Δ | |
---|---|---|
src/minpack_legacy.f90 | 46.42% <46.42%> (ø) |
|
src/minpack_capi.f90 | 76.34% <66.66%> (-22.05%) |
:arrow_down: |
src/minpack.f90 | 88.26% <77.77%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 281e9b9...18a7f67. Read the comment docs.
Updated this branch to show how we can retain the old interface, add support for user data and remove the nested function usage from the C export. It also raises the bar regarding Fortran features a compiler has to support, which can be a disadvantage.
It also raises the bar regarding Fortran features a compiler has to support
It looks like it "only" requires a support for class(*)
procedure arguments.
One notable required feature which breaks compiler support for Intel LLVM is the usage of procedure pointers. Looks like procedure pointers are now implemented with ifx 2022.0.0, so we are good.
Proposal to allow passing user data through minpack as unlimited polymorphic object. This approach has the advantage of mostly keeping the API unchanged, however users have to adjust their objective functions and add another argument.
This or an alternative solution is required to create a proper export of the C bindings.Opening this for discussion.Previously discussed in