fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.09k stars 168 forks source link

Reference implementation of gamma function intrinsic for stdlib? #569

Open bjbraams opened 3 years ago

bjbraams commented 3 years ago

I don't know if the stdlib project would concern itself with standard intrinsic functions. Working with gfortran (10.3.1 on a Fedora Unix system) I found surprising behavior of the gamma intrinsic; in many cases it does not return the exact integer-valued result for small integer-valued argument, and in some cases even anint(gamma(z+1)) returns a wrong result although z! is exactly representable in the real kind used in my code. I provide a demonstration and testing code below. A much broader view of accuracy of implementations of intrinsic functions is provided in this working paper: [Vincenzo Innocente and Paul Zimmermann: Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. Web link: https://members.loria.fr/PZimmermann/papers/accuracy.pdf].

Test and demonstration code:

program TestFactorial
  use iso_fortran_env, only : wp => real32
  implicit none
  ! integer, parameter :: wp=10 ! Try this too
  logical :: b0
  integer :: m0, m1
  write (*,'(a)') 'Testing the factorial function ...'
  call Factorial_Limits (m0, m1)
  ! m0 and m1 are the largest values such that using real (kind=wp):
  ! For 0<=k<m0 Factorial(k) can be represented exactly.
  ! For 0<=k<m1 Factorial(k) is below the overflow limit.
  write (*,'(a,2i5)') 'm0, m1:', m0, m1
  b0 = .false.
  ! b0 has the value (we have found a discrepancy between two ways of
  ! computing the factorial).
  block
    integer :: i
    real (kind=wp) :: t0, t1
    do i = 0, m0-1
       if (i==0) then
          t0 = 1
       else
          t0 = i*t0
       end if
       t1 = gamma(i+1.0_wp)
       ! t0 = i! computed using the recursion.
       ! t1 = i! computed via the gamma intrinsic function.
       if (t1/=t0) then
          b0 = .true.
          write (*,'(a,1x,i3,2(1x,1pg10.2))') 'error:', i, &
               spacing(t0), (t1-t0)/spacing(t0)
       end if
    end do
  end block
  if (.not.b0) then
     write (*,'(a)') '... PASS'
  end if
  stop
contains
  subroutine Factorial_Limits (m0, m1)
    integer, intent (out) :: m0, m1
    ! Determine the limits for computation of the Factorial function
    ! using real (kind=wp).
    ! m0: for 0<=k<m0, Factorial(k) can be represented exactly.
    ! m1: for 0<=k<m1, Factorial(k) is below the overflow limit.
    integer :: m0loc, i, j
    real (kind=wp) :: t0, t1
    logical :: b0
    i = 0 ; t0 = 1 ; t1 = 1 ; b0 = .true.
    m0loc = huge(0)
    ! Invariant: 0 <= i.
    ! t0 = Factorial(i) (subject to roundoff for large i).
    ! t1 = Factorial(i)/2^k where 2^k is the largest power of two that
    ! leaves an (odd) integer result.
    ! b0 = ((i+1)*t0 will not overflow).
    ! m0loc is the smallest nonnegative integer value i for which we
    ! have deterimined that Factorial(i) cannot be represented exactly.
    do while (b0)
       i = i+1
       t0 = i*t0
       j = i
       do while (modulo(j,2).eq.0)
          j = j/2
       end do
       t1 = j*t1
       if (1<spacing(t1)) then
          ! Factorial(i) cannot be represented exactly.
          m0loc = min(i,m0loc)
       end if
       b0 = log(t0)+log(real(i+1,kind=wp)) < log(huge(t0))
    end do
    m0 = min(i+1,m0loc)
    m1 = i+1
    return
  end subroutine Factorial_Limits
end program

(The version posted here is for 32-bit reals; edit the definition of wp at the top to obtain another real format.)

The output for real kind real32 on my gfortran 10.3.1 system:

bash-5.0$ gfortran TestFactorial.f90 
bash-5.0$ ./a.out
Testing the factorial function ...
m0, m1:   14   35
error:   2   2.38E-07    1.0    
error:   3   4.77E-07    1.0    
error:   5   7.63E-06    1.0    
error:   9   3.12E-02    1.0    
error:  11    4.0        1.0    
error:  12    32.        1.0    
error:  13   5.12E+02    2.0    

In the output, m0 and m1 are the largest values such that using real (kind=wp), for 0<=k<m0 Factorial(k) can be represented exactly and for 0<=k<m1 Factorial(k) is below the overflow limit. The following lines show values of k in the range 0<=k<m0 for which the gamma function intrinsic computation of k! is not exact; the second numerical value in the line shows the local number spacing at Factorial(k) and the third numerical value shows the error in units of that spacing. In particular I draw attention to the results for 11!, 12! and 13!, all of which are wrong by an integer offset although the values are exactly representable in the real32 kind. The output for real kinds real64 and real128 shows the same issue, as does the output for "kind=10" that gives 80-bit reals on my system.

arjenmarkus commented 3 years ago

I would say that it would not hurt for the not-entirely elementary functions. I know that for functions like sine and exponential the libraries used by the compiler writers are very intricate and it would be a lot of effort (and possibly a waste of time) to try and compete with them, but for functions like the gamma functions this may be different. It could also be useful to have a set of testing programs to test the quality like you have done.

Op wo 17 nov. 2021 om 08:57 schreef Bastiaan Braams < @.***>:

I don't know if the stdlib project would concern itself with standard intrinsic functions. Working with gfortran (10.3.1 on a Fedora Unix system) I found surprising behavior of the gamma intrinsic; in many cases it does not return the exact integer-valued result for small integer-valued argument, and in some cases even anint(gamma(z+1)) returns a wrong result although z! is exactly representable in the real kind used in my code. I provide a demonstration and testing code below. A much broader view of accuracy of implementations of intrinsic functions is provided in this working paper: [Vincenzo Innocente and Paul Zimmermann: Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. Web link: https://members.loria.fr/PZimmermann/papers/accuracy.pdf].

Test and demonstration code:

program TestFactorial use iso_fortran_env, only : wp => real32 implicit none ! integer, parameter :: wp=10 ! Try this too logical :: b0 integer :: m0, m1 write (,'(a)') 'Testing the factorial function ...' call Factorial_Limits (m0, m1) ! m0 and m1 are the largest values such that using real (kind=wp): ! For 0<=k<m0 Factorial(k) can be represented exactly. ! For 0<=k<m1 Factorial(k) is below the overflow limit. write (,'(a,2i5)') 'm0, m1:', m0, m1 b0 = .false. ! b0 has the value (we have found a discrepancy between two ways of ! computing the factorial). block integer :: i real (kind=wp) :: t0, t1 do i = 0, m0-1 if (i==0) then t0 = 1 else t0 = it0 end if t1 = gamma(i+1.0_wp) ! t0 = i! computed using the recursion. ! t1 = i! computed via the gamma intrinsic function. if (t1/=t0) then b0 = .true. write (,'(a,1x,i3,2(1x,1pg10.2))') 'error:', i, & spacing(t0), (t1-t0)/spacing(t0) end if end do end block if (.not.b0) then write (,'(a)') '... PASS' end if stop contains subroutine Factorial_Limits (m0, m1) integer, intent (out) :: m0, m1 ! Determine the limits for computation of the Factorial function ! using real (kind=wp). ! m0: for 0<=k<m0, Factorial(k) can be represented exactly. ! m1: for 0<=k<m1, Factorial(k) is below the overflow limit. integer :: m0loc, i, j real (kind=wp) :: t0, t1 logical :: b0 i = 0 ; t0 = 1 ; t1 = 1 ; b0 = .true. m0loc = huge(0) ! Invariant: 0 <= i. ! t0 = Factorial(i) (subject to roundoff for large i). ! t1 = Factorial(i)/2^k where 2^k is the largest power of two that ! leaves an (odd) integer result. ! b0 = ((i+1)t0 will not overflow). ! m0loc is the smallest nonnegative integer value i for which we ! have deterimined that Factorial(i) cannot be represented exactly. do while (b0) i = i+1 t0 = it0 j = i do while (modulo(j,2).eq.0) j = j/2 end do t1 = jt1 if (1<spacing(t1)) then ! Factorial(i) cannot be represented exactly. m0loc = min(i,m0loc) end if b0 = log(t0)+log(real(i+1,kind=wp)) < log(huge(t0)) end do m0 = min(i+1,m0loc) m1 = i+1 return end subroutine Factorial_Limits end program

(The version posted here is for 32-bit reals; edit the definition of wp at the top to obtain another real format.)

The output for real kind real32 on my gfortran 10.3.1 system:

bash-5.0$ gfortran TestFactorial.f90 bash-5.0$ ./a.out Testing the factorial function ... m0, m1: 14 35 error: 2 2.38E-07 1.0 error: 3 4.77E-07 1.0 error: 5 7.63E-06 1.0 error: 9 3.12E-02 1.0 error: 11 4.0 1.0 error: 12 32. 1.0 error: 13 5.12E+02 2.0

In the output, m0 and m1 are the largest values such that using real (kind=wp), for 0<=k<m0 Factorial(k) can be represented exactly and for 0<=k<m1 Factorial(k) is below the overflow limit. The following lines show values of k in the range 0<=k<m0 for which the gamma function intrinsic computation of k! is not exact; the second numerical value in the line shows the local number spacing at Factorial(k) and the third numerical value shows the error in units of that spacing. In particular I draw attention to the results for 11!, 12! and 13!, all of which are wrong by an integer offset although the values are exactly representable in the real32 kind. The output for real kinds real64 and real128 shows the same issue, as does the output for "kind=10" that gives 80-bit reals on my system.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZYXH7OBXUKESG4AI3UMNN5LANCNFSM5IGH63GA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

arjenmarkus commented 3 years ago

I was also thinking of the article by Anton Shterenlikht on the implementation of complex intrinsic function wrt branch cuts. The gamma function does not suffer from them IIRC, but still, it may be a point of concern.

Op wo 17 nov. 2021 om 10:27 schreef Arjen Markus @.***

:

I would say that it would not hurt for the not-entirely elementary functions. I know that for functions like sine and exponential the libraries used by the compiler writers are very intricate and it would be a lot of effort (and possibly a waste of time) to try and compete with them, but for functions like the gamma functions this may be different. It could also be useful to have a set of testing programs to test the quality like you have done.

Op wo 17 nov. 2021 om 08:57 schreef Bastiaan Braams < @.***>:

I don't know if the stdlib project would concern itself with standard intrinsic functions. Working with gfortran (10.3.1 on a Fedora Unix system) I found surprising behavior of the gamma intrinsic; in many cases it does not return the exact integer-valued result for small integer-valued argument, and in some cases even anint(gamma(z+1)) returns a wrong result although z! is exactly representable in the real kind used in my code. I provide a demonstration and testing code below. A much broader view of accuracy of implementations of intrinsic functions is provided in this working paper: [Vincenzo Innocente and Paul Zimmermann: Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. Web link: https://members.loria.fr/PZimmermann/papers/accuracy.pdf].

Test and demonstration code:

program TestFactorial use iso_fortran_env, only : wp => real32 implicit none ! integer, parameter :: wp=10 ! Try this too logical :: b0 integer :: m0, m1 write (,'(a)') 'Testing the factorial function ...' call Factorial_Limits (m0, m1) ! m0 and m1 are the largest values such that using real (kind=wp): ! For 0<=k<m0 Factorial(k) can be represented exactly. ! For 0<=k<m1 Factorial(k) is below the overflow limit. write (,'(a,2i5)') 'm0, m1:', m0, m1 b0 = .false. ! b0 has the value (we have found a discrepancy between two ways of ! computing the factorial). block integer :: i real (kind=wp) :: t0, t1 do i = 0, m0-1 if (i==0) then t0 = 1 else t0 = it0 end if t1 = gamma(i+1.0_wp) ! t0 = i! computed using the recursion. ! t1 = i! computed via the gamma intrinsic function. if (t1/=t0) then b0 = .true. write (,'(a,1x,i3,2(1x,1pg10.2))') 'error:', i, & spacing(t0), (t1-t0)/spacing(t0) end if end do end block if (.not.b0) then write (,'(a)') '... PASS' end if stop contains subroutine Factorial_Limits (m0, m1) integer, intent (out) :: m0, m1 ! Determine the limits for computation of the Factorial function ! using real (kind=wp). ! m0: for 0<=k<m0, Factorial(k) can be represented exactly. ! m1: for 0<=k<m1, Factorial(k) is below the overflow limit. integer :: m0loc, i, j real (kind=wp) :: t0, t1 logical :: b0 i = 0 ; t0 = 1 ; t1 = 1 ; b0 = .true. m0loc = huge(0) ! Invariant: 0 <= i. ! t0 = Factorial(i) (subject to roundoff for large i). ! t1 = Factorial(i)/2^k where 2^k is the largest power of two that ! leaves an (odd) integer result. ! b0 = ((i+1)t0 will not overflow). ! m0loc is the smallest nonnegative integer value i for which we ! have deterimined that Factorial(i) cannot be represented exactly. do while (b0) i = i+1 t0 = it0 j = i do while (modulo(j,2).eq.0) j = j/2 end do t1 = jt1 if (1<spacing(t1)) then ! Factorial(i) cannot be represented exactly. m0loc = min(i,m0loc) end if b0 = log(t0)+log(real(i+1,kind=wp)) < log(huge(t0)) end do m0 = min(i+1,m0loc) m1 = i+1 return end subroutine Factorial_Limits end program

(The version posted here is for 32-bit reals; edit the definition of wp at the top to obtain another real format.)

The output for real kind real32 on my gfortran 10.3.1 system:

bash-5.0$ gfortran TestFactorial.f90 bash-5.0$ ./a.out Testing the factorial function ... m0, m1: 14 35 error: 2 2.38E-07 1.0 error: 3 4.77E-07 1.0 error: 5 7.63E-06 1.0 error: 9 3.12E-02 1.0 error: 11 4.0 1.0 error: 12 32. 1.0 error: 13 5.12E+02 2.0

In the output, m0 and m1 are the largest values such that using real (kind=wp), for 0<=k<m0 Factorial(k) can be represented exactly and for 0<=k<m1 Factorial(k) is below the overflow limit. The following lines show values of k in the range 0<=k<m0 for which the gamma function intrinsic computation of k! is not exact; the second numerical value in the line shows the local number spacing at Factorial(k) and the third numerical value shows the error in units of that spacing. In particular I draw attention to the results for 11!, 12! and 13!, all of which are wrong by an integer offset although the values are exactly representable in the real32 kind. The output for real kinds real64 and real128 shows the same issue, as does the output for "kind=10" that gives 80-bit reals on my system.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZYXH7OBXUKESG4AI3UMNN5LANCNFSM5IGH63GA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

bjbraams commented 3 years ago

The report by Innocente and Zimmermann that I cited shows excellent performance of the Intel Math Library (IML) for lgamma and tgamma for single, double and extended precision, but having some problem with quadruple precision. So a first approach could be to see if the IML code is available for stdlib.

kargl commented 3 years ago

This isn't a gfortran problem (except possibly for REAL(16) which uses libquadmath). Your math library is broken, which is trivial to prove.

real(4): gfcx -c c.f90 && nm c.o | grep gamma
             U tgammaf
         gfcx -o z -O c.f90 && ./z
         Testing the factorial function ...
         m0, m1:   14   35
         ... PASS

real(8): gfcx -c c.f90 && nm c.o | grep gamma
             U tgamma
         gfcx -o z -O c.f90 && ./z
         Testing the factorial function ...
         m0, m1:   23  171
         ... PASS

tgammal is known to be broken on FreeBSD.

real(10): gfcx -c c.f90 && nm c.o | grep gamma
             U tgammal
         gfcx -o z -O c.f90 && ./z
         /usr/local/bin/ld: /tmp/ccTrrwjD.o: in function `MAIN__':
         c.f90:(.text+0x3a2): warning: tgammal has lower than advertised precision
         Testing the factorial function ...
         m0, m1:   26 1755
         error:  23   2.05E+03   7.68E+02
         error:  24   6.55E+04  -4.48E+02
         error:  25   1.05E+06    68.   

real(16): gfcx -c c.f90 && nm c.o | grep gamma
             U tgammaq
         gfcx -o z -O c.f90 && ./z
         Testing the factorial function ...
         m0, m1:   38 1755
         error:  21   7.11E-15   -1.0    
         error:  22   1.14E-13   -1.0    
         error:  24   1.16E-10   -1.0    
         error:  26   5.96E-08    1.0    
         error:  28   3.05E-05   -1.0    
         error:  30   3.12E-02   -1.0    
         error:  35   1.05E+06   -1.0    

Complain to your OS vendor.

bjbraams commented 3 years ago

Not a gfortran issue then, but could still be of interest to stdlib. I get the same bad results from gamma on a Fedora system at work and on Ubuntu 20.04 at home. Haven't customized anything relevant, as far as I know, although to use the gfortran command at home I needed to do 'sudo apt get install gfortran' with nothing further. Both on the Fedora and on the Ubuntu system the nm command (as used by kargl) shows that the gamma function is found as tgammaf, tgamma, tgammal or tgammaq depending on the precision. The following output shows the list of libraries that are linked for the real32 case. I don't know which one of those supplies the offending tgamma function.

bash-5.0$ gfortran -Wl,--trace TestFact.f90 
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/crt1.o
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/crti.o
/usr/lib/gcc/x86_64-redhat-linux/10/crtbegin.o
/tmp/ccFOsj4V.o
/usr/lib/gcc/x86_64-redhat-linux/10/libgfortran.so
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/libm.so
/lib64/libm.so.6
/lib64/libmvec.so.1
/lib64/libmvec.so.1
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc_s.so
/lib64/libgcc_s.so.1
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/libquadmath.so
/usr/lib64/libquadmath.so.0.0.0
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/libm.so
/lib64/libm.so.6
/lib64/libmvec.so.1
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc_s.so
/lib64/libgcc_s.so.1
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/libc.so
/lib64/libc.so.6
/usr/lib64/libc_nonshared.a
/lib64/ld-linux-x86-64.so.2
/usr/lib64/libc_nonshared.a
/lib64/ld-linux-x86-64.so.2
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc_s.so
/lib64/libgcc_s.so.1
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/libgcc.a
/usr/lib/gcc/x86_64-redhat-linux/10/crtend.o
/usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/crtn.o

Addendum: Inspection of those system files using 'nm -D' suggests to me that the gamma functions other than tgammaq come from /lib64/libm.so.6 and tgammaq comes from /usr/lib64/libquadmath.so.0.0.0. The /lib64/libm.so.6 redirects to /usr/lib64/libm-2.32.so.

kargl commented 3 years ago

I already told you that tgammaf, tgamma, and tgammal came from libm and tgammaq came from libquadpack. You need to complain to your operating system vendor.

I think this suggestion (and the one about log2) are misguided. Why re-invent the wheel if the operating system provides an implementation?

The C standard, which is what libm strives to adhere, has the following requirements:

F.9.5.4 The tgamma functions

* tgamma(+-0) returns +-inf and raises the _divide-by-zero_
  floating-point exception.

* tgamma(x) returns a NaN and raises the _invalid_ floating-point
  exception for x a  negative integer.

* tgamma(-inf) returns a NaN and raises the _invalid_
  floating-point exception.

* tgamma(+inf) returns +inf.

There is no requirement on any positive integer x. Why? Because tgamma returns a floating point value and in general one wants a fast implementation that returns a value with maximum ULP less than 1 over some interval (where ideally the max ULP should approach 0.5 over that interval). For double precision, tgamma(x) overflows for x > 171.63-ish, and with FreeBSD there is a single approximation over the interval [1.e-17, 171.63]. A simple test on FreeBSD shows

   % tlibm -x 1 -X 171 -n 171 -d tgamma
   Interval tested for tgamma: [1,171]
   count: 171
   xm =  1.3700000000000000e+02, /* 0x40612000, 0x00000000 */
   libm =  3.6590428819525483e+232, /* 0x70379185, 0x413b0854 */
   mpfr =  3.6590428819525489e+232, /* 0x70379185, 0x413b0855 */
   ULP = 0.62555

Having spent part of last week reverse engineering tgamma(x) to address the issue of tgammal(x) on FreeBSD, I can assure that there is no special casing of if ((int)x == x) in the code.