fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.08k stars 166 forks source link

Proposal for clamp #319

Closed milancurcic closed 3 years ago

milancurcic commented 3 years ago

Intro

clamp(x, xmin, xmax) is a function that clamps (or, clips) an input scalar or array x so that the result is within given bounds xmin and xmax. For example:

res = clamp(0.9. 0.2, 0.8) ! res is 0.8
res = clamp(0.7. 0.2, 0.8) ! res is 0.7
res = clamp(0.1, 0.2, 0.8) ! res is 0.2

This function is common in standard libraries. I've used it occasionally in Fortran, and I use it often in Python. It provides an easy way to ensure that a variable with a finite (non-NaN, non-Inf) value stays within bounds.

Name

Python has numpy.clip, while C++, Julia, and Rust all have clamp. So clamp is more common. Both clamp and clip are equally meaningful to me. I have slight preference for clip because it's shorter and has a "sharper" sound to it.

What inspired me to open this proposal is reading that clamp was stabilized in Rust 1.50.0.

API

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x ! input value to clip
  real, intent(in) :: xmin ! lower bound
  real, intent(in) :: xmax ! higher bound

Alternative names for the bounds could be low and high.

Implementation

There are a few options that I know of, each of which produce different assembly instructions, and performance depending on optimization levels, array sizes, and whether the clamping is in a loop or over a whole array in a single call.

A min/max clamp

This is the simplest implementation I could think of:

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x, xmin, xmax
  res = min(max(x, xmin), xmax)
end function clamp

There are two intrinsic function calls here, but maybe a good compiler will optimize one away. I don't know.

An if/then/else clamp

elemental real function clamp(x, xmin, xmax) result(res)
  real, intent(in) :: x, xmin, xmax
  if (x < xmin) then
    res = xmin
  else if (x > xmax) then
    res = xmax
  else
    res = x
  end if
end function clamp

A branchless clamp

This avoids branching but introduces a floating-point error (~epsilon(x)) for some elements, even if x is not clipped.

elemental real function clamp(x, xmin, xmax) result(res)
  ! A branchless clamp.
  ! Credit: Mark Ransom (https://stackoverflow.com/a/7424900/827297)
  real, intent(in) :: x, xmin, xmax
  res = (x + xmin + abs(x - xmin)) / 2
  res = (res + xmax - abs(res - xmax)) / 2
end function clamp

What module would this go to?

I don't think we have an appropriate existing module yet, but stdlib_numeric or stdlib_math sound meaningful to me.

ivan-pi commented 3 years ago

Do you think there will be much use for passing arrays to the xmin or xmax arguments?

If not, I would propose a fourth implementation option:

A where clamp

The procedures are auto-generated for each rank and type (real, double, ...)

interface clamp
  module procedure clamp_rank0
  module procudure clamp_rank1
  module procedure clamp_rank2
  ...
end interface
function clamp_rank0(x,xmin,xmax)
! ... use min max or ifthen/else version
end function

function clamp_rank1(x,xmin,xmax) result(res)
  real, intent(in) :: x(:)
  real, intent(in) :: xmin, xmax
  real :: res(size(x))
  where (x < xmin)
    res = xmin
  elsewhere (x > xmax)
    res = xmax
  elsewhere
   res = x
  end where
end function

The same logic applies to where/elsewhere logic applies to higher ranks.

ivan-pi commented 3 years ago

Concerning name I also have a personal preference for clip. On the other hand consistency with C++, Julia, and Rust does have some weight. I would also vote in favor of stdlib_numeric vs stdlib_math.

milancurcic commented 3 years ago

Is the usefulness of a where clamp to avoid it being elemental, like in https://github.com/fortran-lang/stdlib/pull/310#issuecomment-776978642?

If yes, is it so that clamp could be passed as a procedure pointer? If yes, it makes me think--how do you do it with a generic? I thought that generic can't be used as a procedure pointer because you need to specify the procedure interface in the list of dummy arguments. Or are there other benefits to non-elemental procedures?

milancurcic commented 3 years ago

Do you think there will be much use for passing arrays to the xmin or xmax arguments?

I don't think much but maybe some. I used it a few times in Python for 2-d arrays to clamp only within a radius and to do a masked clamp. There are alternative ways to do that though.

But this could also be provided through specific functions that define xmin and xmax as arrays. So if we chose to do a generic implementation instead of elemental, then allowing xmin and xmax to be arrays would only double the number of specific functions.

ivan-pi commented 3 years ago

Is the usefulness of a where clamp to avoid it being elemental, like in #310 (comment)?

No, in this context I was only playing with the idea of a generic but non-elemental interface to forbid usage such as:

res = clamp(0.1,[0.2,0.3,0.4],0.5)

which if the implementation were min(max(x, xmin), xmax) would return the 3-by-1 vector [0.2,0.3,0.4].

I haven't used the clamp function before to be able to say is it useful or not. Just throwing the concern out for discussion.

milancurcic commented 3 years ago

I see, that didn't occur to me. I can't think of an use for it. I agree that we should somehow forbid it or at least discourage it. I see a few solutions:

ivan-pi commented 3 years ago

I would prefer to avoid run-time checking. This would make the procedures impure so I don't see any benefit. Having the routine return without warning also make no sense.

This leaves us with the first and third bullet points, meaning either pure procedures for different ranks with a generic interface, or the elemental version (either min/max or the if-then-else versions). Actually it seems like a great experiment to learn about potential performance differences. :+1:

Am I right that according to the specified behavior positive Inf values are mapped to xmax, while -Inf is mapped to xmin? NaN values however are preserved?

milancurcic commented 3 years ago

I would prefer to avoid run-time checking. This would make the procedures impure so I don't see any benefit. Having the routine return without warning also make no sense.

Yes, I would too. We can error stop from a pure a procedure (F2018) but it'd still need a check which would hinder performance.

milancurcic commented 3 years ago

My heart prefers the elemental approach (it's one of my favorite features of Fortran) but my head prefers the generic where clamp approach (compile-time enforcing of argument shape). Let's discuss it more broadly on the upcoming call.

ivan-pi commented 3 years ago

My heart prefers the elemental approach (it's one of my favorite features of Fortran) but my head prefers the generic where clamp approach (compile-time enforcing of argument shape). Let's discuss it more broadly on the upcoming call.

I wonder if this is something the standard should address - being able to explicitly enforce that elemental affects only certain variables (perhaps through an attribute such as nonelemental or scalar).

art-rasa commented 3 years ago

Hello all, I was just thinking if there would be any need for a procedure that would scale the value of an input parameter into an output value defined by min/max parameters. This could be used, for example, scaling an input scalar (or array) into a scaled value describing some physical quantity, such as temperature between defined limits (say -50 ℃ and 100 ℃).

MarDiehl commented 3 years ago

My current implementation with optional upper and lower bounds and 'hint' to the user if lower>upper (inspired by numpy). Probably error stop would be better than NaN, but I'm not clear about performance implications.

real(pReal) pure elemental function math_clip(a, left, right)

  real(pReal), intent(in) :: a
  real(pReal), intent(in), optional :: left, right

  math_clip = a
  if (present(left))  math_clip = max(left,math_clip)
  if (present(right)) math_clip = min(right,math_clip)
  if (present(left) .and. present(right)) &
    math_clip = merge (IEEE_value(1.0_pReal,IEEE_quiet_NaN),math_clip, left>right

end function math_clip
milancurcic commented 3 years ago

@art-rasa Yes, I think so, I've done this many times in the past (home cooked) and is very common in machine learning. Can you propose it in a separate issue? Ideally with an API, name suggestion, and example implementation.

milancurcic commented 3 years ago

After discussing the elemental vs. where-clamp on the monthly call, I'm now more convinced that the elemental approach is just fine. The thing is, a where-clamp is a not so pretty hack to constrain how the user can invoke the function, and we're assuming that no user will want to use it exactly in the way that we would be preventing. So I'd rather have a more liberal and elegant solution, and point this out to the user in the docs.

arjenmarkus commented 3 years ago

@Milan Curcic caomaco@gmail.com I concur. It is difficult to foresee use patterns and most people will simply use it with scalar bounds, I imagine. Actually,before you brought it up, I did not even think of using an array for the bounds, though now that I write this, it is simply an extension on min(a,b) and max(a,b) with both arguments an array and that is certainly something I use from time to time!

Op wo 3 mrt. 2021 om 22:29 schreef Milan Curcic notifications@github.com:

After discussion the elemental vs. where-clamp on the monthly call, I'm now more convinced that the elemental approach is just fine. The thing is, a where-clamp is a not so pretty hack to constrain how the user can invoke the function, and we're assuming that some user may not want to use it exactly in the way that we would be preventing. So I'd rather have a more liberal and elegant solution, and point this out to the user in the docs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/319#issuecomment-790067673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR344ONAOTQPFQVQXG3TB2S35ANCNFSM4XRNNGQQ .

aman-godara commented 3 years ago

What if user doesn't know for sure that the 2nd argument (xmin) is <= 3rd argument (xmax) but knows that this is the clip range in which the output should be? for example: x = 9 xmin= 11 xmax = 10 answer expected by user = 10 I don't have specific example of an application where user might need to do it, but it occurred to me as a possibility.

MarDiehl commented 3 years ago

What if user doesn't know for sure that the 2nd argument (xmin) is <= 3rd argument (xmax) but knows that this is the clip range in which he/she might the output to be? for example: x = 9 xmin= 11 xmax = 10 answer expected by user = 10 I don't have specific example of an application where user might need to do it, but it occurred to me as a possibility.

I would say that this is invalid use of the function, the user would simply need to ensure that this does not happen. It's not an uncommon situation that things are forbidden (square root of a negative real number, division by zero, exceeding the range of a certain datatype, ...). To me, the more interesting question is how to handle this. Numpy does not check, probably for performance reasons. My implementation exits with error stop because I think that the time I spend on debugging is much more valuable then the time my computer spends on doing calculations.

aman-godara commented 3 years ago

OK, let's leave it on user to provide smaller number as xmin and larger number as xmax.

I liked the suggestions @MarDiehl has given above:

My current implementation with optional upper and lower bounds and 'hint' to the user if lower>upper

I think this provides user more enhanced feature in one function i.e. a user can give one of the two optional argument as None indicating that he/she doesn't want to use one of the two clips (xmin and xmax). If None is passed in both then it doesn't make any sense (we can raise a warning or return the input as the output)

@MarDiehl can you please explain what this line does?

if (present(left) .and. present(right)) & math_clip = merge (IEEE_value(1.0_pReal,IEEE_quiet_NaN),math_clip, left>right

To me, the more interesting question is how to handle this.

I thought of doing it by making a comparator function (func1) which gives -1, 0 or 1 as the output. If we multiply func1(x, xmin) with func1(x, xmax) and the answer is 1, then we know that the output will NOT be x. Otherwise output will be x. Please also let me know if you are thinking of any other implementation than this. OR we can just set xmin as min(xmin, xmax) and xmax as max(xmin, xmax)

milancurcic commented 3 years ago

To me, the more interesting question is how to handle this. Numpy does not check, probably for performance reasons. My implementation exits with error stop because I think that the time I spend on debugging is much more valuable then the time my computer spends on doing calculations.

I think it's important to have both--allow it to error stop in debug mode, and don't check in release mode. Tiny numerical functions like this should be HPC-useful. We can do this with a preprocessor.

aman-godara commented 3 years ago

Hey!, how can I contribute now to get this issue closed?

milancurcic commented 3 years ago

Okay, here are my suggestions:

  1. Name the function clip.
  2. This will go to a new module, so decide on a module name. I think it either stdlib_math or stdlib_numeric are appropriate, and I have slight preference for stdlib_math.
  3. clip should work on integers and reals of all kinds (int8, int16, int32, int64, real32, real64, real128), so there will be 7 specific functions, and one generic name. Specific functions will be generated by the fypp preprocessor. For example, take a look at stdlib_stats.fypp. Similarly, the source file for this module will be stdlib_math.fypp (instead of .f90).
  4. Pick an implementation, I prefer the min/max one.
  5. For now, just go for the simplest, smallest PR that implements this correctly. We can look at run-time checking in debug mode like @MarDiehl does in his implementation later.
  6. Have tests for both valid and invalid inputs.
  7. Implement the specification doc (docstrings in code), for one function it should be straightforward.
  8. Make sure the new module builds and tests out using both CMake and Makefile builds.
aman-godara commented 3 years ago

I would like to work on two issues at a time: this one and 17 (linspace and logspace). Is it possible to work on both at a time under Fortran's policy for GSoC students?

awvwgk commented 3 years ago

@Aman-Godara Feel free to submit a patch for this issue.