fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.06k stars 164 forks source link

Compensated summation #402

Open Beliavsky opened 3 years ago

Beliavsky commented 3 years ago

Rosetta Code has a section on Kahan summation, which "is a method of summing a series of numbers represented in a limited precision in a way that minimises the loss of precision in the result." I have extracted and slightly modified the sumc function from the code there. Here is an example of how it can produce a more accurate result than the intrinsic SUM. I suggest that compensated summation be added to stdlib.

module m
implicit none
private
public :: sumc,wp
integer, parameter :: wp = kind(1.0d0)
contains
function sumc(a) result(asum)   !add elements of the array, using limited precision.
real(kind=wp)   , intent(in) :: a(:)
real(kind=wp)                :: asum
real(kind=wp)                :: s,c,y,t ! assistants.
integer                  :: i ! stepper
integer                      :: n
n = size(a)
if (n < 1) then
   asum = 0.0_wp
   return
end if
s = a(1)    ! start with the first element.
c = 0.0_wp  ! no previous omissions to carry forward.
do i = 2,n  ! step through the remainder of the array.
   y = a(i) - c     ! combine the next value with the compensation.
   t = s + y        ! augment the sum, temporarily in t.
   c = (t - s) - y  ! catch what part of y didn't get added to t.
   s = t                ! place the sum.
end do              ! on to the next element.
asum = s            ! c will no longer be zero.
end function sumc
end module m
program main
! 04/26/2021 11:39 AM driver for sumc
use m, only: sumc,wp
implicit none
integer, parameter :: n = 100000000
real(kind=wp)      :: x(n)
real(kind=wp), parameter :: xadd = 10.0_wp**15
call random_seed()
call random_number(x)
x = x + xadd
print*,([sumc(x),sum(x)] - n*xadd)/n
end program main

output: 0.50331647999999996 2207252.1444556802

If you use wp = kind(1.0) (single precision) in the code above, you can see the difference even with xadd = 0.0. The output of

module m
implicit none
private
public :: sumc,wp
integer, parameter :: wp = kind(1.0)
contains
function sumc(a) result(asum)   !add elements of the array, using limited precision.
! adapted from Rosetta Code https://rosettacode.org/wiki/Kahan_summation
real(kind=wp)   , intent(in) :: a(:)
real(kind=wp)                :: asum
real(kind=wp)                :: s,c,y,t ! assistants.
integer                  :: i ! stepper
integer                      :: n
n = size(a)
if (n < 1) then
   asum = 0.0_wp
   return
end if
s = a(1)    ! start with the first element.
c = 0.0_wp  ! no previous omissions to carry forward.
do i = 2,n  ! step through the remainder of the array.
   y = a(i) - c     ! combine the next value with the compensation.
   t = s + y        ! augment the sum, temporarily in t.
   c = (t - s) - y  ! catch what part of y didn't get added to t.
   s = t                ! place the sum.
end do              ! on to the next element.
asum = s            ! c will no longer be zero.
end function sumc
end module m
!
program main
! 04/26/2021 11:39 AM driver for sumc
use m, only: sumc,wp
implicit none
integer, parameter :: n = 100000000
real(kind=wp)      :: x(n)
call random_seed()
call random_number(x)
print*,[sumc(x),sum(x)]/n
end program main

is 0.499940693 0.167772159

ivan-pi commented 3 years ago

cc @certik I'm interested in your opinion on this one.

arjenmarkus commented 3 years ago

Isn't a problem with this algorithm that the compiler has to respect the parentheses? That would probably not be the case with any optimisation option and I do not know if there is any "in-source" method (even compiler-dependent) to enforce this. I guess the right compile option might do this, but that is a rather brittle way of achieving this goal.

Op ma 26 apr. 2021 om 18:54 schreef Ivan Pribec @.***>:

cc @certik https://github.com/certik I'm interested in your opinion on this one.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-826994852, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4REX435LYB32AGBPTTKWLDZANCNFSM43TGYCEA .

leonfoks commented 3 years ago

I seem to remember the "volatile" declaration statement. Can that be used to enforce no optimization on the term with parentheses?

arjenmarkus commented 3 years ago

That might be a possibility, but I do not know - the problem is that the parentheses are superfluous from a mathematical point of view, so that seems orthogonal to a volatile variable.

Op ma 26 apr. 2021 om 20:49 schreef Leon Foks @.***>:

I seem to remember the "volatile" declaration statement. Can that be used to enforce no optimization on the term with parentheses?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-827066130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR2KI2TUWXQJ4HESZO3TKWYSZANCNFSM43TGYCEA .

Beliavsky commented 3 years ago

Isn't a problem with this algorithm that the compiler has to respect the parentheses? That would probably not be the case with any optimisation option and I do not know if there is any "in-source" method (even compiler-dependent) to enforce this. I guess the right compile option might do this, but that is a rather brittle way of achieving this goal. Op ma 26 apr. 2021 om 18:54 schreef Ivan Pribec @.***>:

For both programs I presented, gfortran compiled with no option, -O1, -O2, -O3, or -ffast-math all give the same output.

arjenmarkus commented 3 years ago

Well, that is encouraging, but unless I am mistaken, Intel Fortran does not adhere to parentheses in the same way, at least not by default.

Op ma 26 apr. 2021 om 22:11 schreef Beliavsky @.***>:

Isn't a problem with this algorithm that the compiler has to respect the parentheses? That would probably not be the case with any optimisation option and I do not know if there is any "in-source" method (even compiler-dependent) to enforce this. I guess the right compile option might do this, but that is a rather brittle way of achieving this goal. Op ma 26 apr. 2021 om 18:54 schreef Ivan Pribec @.***>: … <#m5849492995325008915> cc @certik https://github.com/certik https://github.com/certik I'm interested in your opinion on this one. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#402 (comment) https://github.com/fortran-lang/stdlib/issues/402#issuecomment-826994852>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4REX435LYB32AGBPTTKWLDZANCNFSM43TGYCEA .

For both programs I presented, gfortran compiled with no option, -O1, -O2, -O3, or -ffast-math all give the same output.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-827112486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4XGE5O7HCB4ETIWKTTKXCFXANCNFSM43TGYCEA .

Beliavsky commented 3 years ago

Here is a Windows CMD batch file to compile and run a program with various Intel Fortran optimization options.

:: compile and run single-file Fortran code with various Intel Fortran optimization options
:: usage: ifortopt.bat foo to compile foo.f90
setlocal
set cmpl=ifort -nologo
%cmpl% %1.f90
%1.exe
%cmpl% -O1 %1.f90
%1.exe
%cmpl% -O2 %1.f90
%1.exe
%cmpl% -O3 %1.f90
%1.exe

Using this to compile the single precision source file earlier presented, called kahan_sp.f90, the results are

c:\fortran\test>ifortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo 

c:\fortran\test>ifort -nologo kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50059.1860121600     

c:\fortran\test>ifort -nologo -O1 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000        2207252.14445568     

c:\fortran\test>ifort -nologo -O2 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50058.8504678400     

c:\fortran\test>ifort -nologo -O3 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50059.6893286400     

c:\fortran\test>ifortopt.bat kahan_sp

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo 

c:\fortran\test>ifort -nologo kahan_sp.f90 

c:\fortran\test>kahan_sp.exe
  0.4999992      0.5000012    

c:\fortran\test>ifort -nologo -O1 kahan_sp.f90 

c:\fortran\test>kahan_sp.exe
  0.5000105      0.1677722    

c:\fortran\test>ifort -nologo -O2 kahan_sp.f90 

c:\fortran\test>kahan_sp.exe
  0.4999662      0.4999661    

c:\fortran\test>ifort -nologo -O3 kahan_sp.f90 

c:\fortran\test>kahan_sp.exe
  0.4999255      0.4999254    

The two numbers are the averages of random uniform numbers using sumc (compensated summation) and intrinsic SUM. So it is intrinsic SUM that has results depending on the optimization level.

For the double precision code, stored as kahan.f90, sumc gives correct results for all optimization levels, and instrinsic SUM is always wrong:

c:\fortran\test>ifortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo 

c:\fortran\test>ifort -nologo kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50059.8571008000     

c:\fortran\test>ifort -nologo -O1 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000        2207252.14445568     

c:\fortran\test>ifort -nologo -O2 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50059.1860121600     

c:\fortran\test>ifort -nologo -O3 kahan.f90 

c:\fortran\test>kahan.exe
  0.503316480000000       -50059.5215564800     

Also for gfortran, optimization levels do not affect the accuracy of compensated sumation:

c:\fortran\test>gfortran kahan_sp.f90 

c:\fortran\test>a.exe
  0.500024736      0.167772159    

c:\fortran\test>gfortran -O1 kahan_sp.f90 

c:\fortran\test>a.exe
  0.500004590      0.167772159    

c:\fortran\test>gfortran -O2 kahan_sp.f90 

c:\fortran\test>a.exe
  0.499956489      0.167772159    

c:\fortran\test>gfortran -O3 kahan_sp.f90 

c:\fortran\test>a.exe
  0.500010133      0.167772159    

c:\fortran\test>gfortran -ffast-math kahan_sp.f90 

c:\fortran\test>a.exe
  0.500054419      0.167772159    

c:\fortran\test>gfortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=gfortran 

c:\fortran\test>gfortran kahan.f90 

c:\fortran\test>a.exe
  0.50331647999999996        2207251.9766835198     

c:\fortran\test>gfortran -O1 kahan.f90 

c:\fortran\test>a.exe
  0.50331647999999996        2207252.3122278401     

c:\fortran\test>gfortran -O2 kahan.f90 

c:\fortran\test>a.exe
  0.50331647999999996        2207251.9766835198     

c:\fortran\test>gfortran -O3 kahan.f90 

c:\fortran\test>a.exe
  0.50331647999999996        2207252.1444556802     

c:\fortran\test>gfortran -ffast-math kahan.f90 

c:\fortran\test>a.exe
  0.50331647999999996        2207252.1444556802     
arjenmarkus commented 3 years ago

I stand corrected 😊. I remembered reading something like that in the compiler's help output:

/Qprotect-parens[-] enable/disable(DEFAULT) a reassociation optimization for REAL and COMPLEX expression evaluations by not honoring parenthesis

but I must have misread that!

Op ma 26 apr. 2021 om 23:11 schreef Beliavsky @.***>:

Here is a Windows CMD batch file to compile and run a program with various Intel Fortran optimization options.

:: compile and run single-file Fortran code with various Intel Fortran optimization options :: usage: ifortopt.bat foo to compile foo.f90 setlocal set cmpl=ifort -nologo %cmpl% %1.f90 %1.exe %cmpl% -O1 %1.f90 %1.exe %cmpl% -O2 %1.f90 %1.exe %cmpl% -O3 %1.f90 %1.exe

Using to compile the single precision source file earlier presented, called kahan_sp.f90, the results are

c:\fortran\test>ifortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo

c:\fortran\test>ifort -nologo kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50059.1860121600

c:\fortran\test>ifort -nologo -O1 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 2207252.14445568

c:\fortran\test>ifort -nologo -O2 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50058.8504678400

c:\fortran\test>ifort -nologo -O3 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50059.6893286400

c:\fortran\test>ifortopt.bat kahan_sp

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo

c:\fortran\test>ifort -nologo kahan_sp.f90

c:\fortran\test>kahan_sp.exe 0.4999992 0.5000012

c:\fortran\test>ifort -nologo -O1 kahan_sp.f90

c:\fortran\test>kahan_sp.exe 0.5000105 0.1677722

c:\fortran\test>ifort -nologo -O2 kahan_sp.f90

c:\fortran\test>kahan_sp.exe 0.4999662 0.4999661

c:\fortran\test>ifort -nologo -O3 kahan_sp.f90

c:\fortran\test>kahan_sp.exe 0.4999255 0.4999254

The two numbers are the averages of random uniform numbers using sumc (compensated summation) and intrinsic SUM. So it is intrinsic SUM that has results depending on the optimization level.

For the double precision code, stored as kahan.f90, sumc gives correct results for all optimization levels, and instrinsic SUM is always wrong:

c:\fortran\test>ifortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=ifort -nologo

c:\fortran\test>ifort -nologo kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50059.8571008000

c:\fortran\test>ifort -nologo -O1 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 2207252.14445568

c:\fortran\test>ifort -nologo -O2 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50059.1860121600

c:\fortran\test>ifort -nologo -O3 kahan.f90

c:\fortran\test>kahan.exe 0.503316480000000 -50059.5215564800

Also for gfortran, optimization levels do not affect the accuracy of compensated sumation:

c:\fortran\test>gfortran kahan_sp.f90

c:\fortran\test>a.exe 0.500024736 0.167772159

c:\fortran\test>gfortran -O1 kahan_sp.f90

c:\fortran\test>a.exe 0.500004590 0.167772159

c:\fortran\test>gfortran -O2 kahan_sp.f90

c:\fortran\test>a.exe 0.499956489 0.167772159

c:\fortran\test>gfortran -O3 kahan_sp.f90

c:\fortran\test>a.exe 0.500010133 0.167772159

c:\fortran\test>gfortran -ffast-math kahan_sp.f90

c:\fortran\test>a.exe 0.500054419 0.167772159

c:\fortran\test>gfortopt.bat kahan

c:\fortran\test>setlocal

c:\fortran\test>set cmpl=gfortran

c:\fortran\test>gfortran kahan.f90

c:\fortran\test>a.exe 0.50331647999999996 2207251.9766835198

c:\fortran\test>gfortran -O1 kahan.f90

c:\fortran\test>a.exe 0.50331647999999996 2207252.3122278401

c:\fortran\test>gfortran -O2 kahan.f90

c:\fortran\test>a.exe 0.50331647999999996 2207251.9766835198

c:\fortran\test>gfortran -O3 kahan.f90

c:\fortran\test>a.exe 0.50331647999999996 2207252.1444556802

c:\fortran\test>gfortran -ffast-math kahan.f90

c:\fortran\test>a.exe 0.50331647999999996 2207252.1444556802

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-827149808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4LJ3WUEYHMFEVISFTTKXJJHANCNFSM43TGYCEA .

Beliavsky commented 3 years ago

The Wikipedia articles Kahan summation algorithm and Algorithms for calculating variance have pseudocode and Python code, which "beg" for Fortran implementations. Neumaier and Klein have presented enhancements to Kahan's algorithm. Maybe the mean, variance, covariance, and correlation functions in stdlib should have options for more accurate calculations than those using the obvious formulas. I wonder how much longer the more accurate algorithms take.

The Kahan article discusses the issue of @arjenmarkus in the section "Possible invalidation by compiler optimization". The "Support by Libraries" section says that Python, Julia, and C# have sum functions that go beyond simple recursive summation, which suggests that this functionality may also be appropriate for stdlib:

The standard library of the Python computer language specifies an fsum function for exactly rounded summation, using the Shewchuk algorithm[9] to track multiple partial sums.

In the Julia language, the default implementation of the sum function does pairwise summation for high accuracy with good performance,[23] but an external library provides an implementation of Neumaier's variant named sum_kbn for the cases when higher accuracy is needed.[24]

In the C# language, HPCsharp nuget package implements the Neumaier variant and pairwise summation: both as scalar, data-parallel using SIMD processor instructions, and parallel multi-core.[25]

Here is a relevant paper. The author says he used the the Intel Visual Fortran 9.1 compiler. SIAM J. SCI. COMPUT. c 2009 Society for Industrial and Applied Mathematics Vol. 31, No. 5, pp. 3466–3502 Ultimately Fast Accurate Summation SIEGFRIED M. RUMP† Abstract. We present two new algorithms FastAccSum and FastPrecSum, one to compute a faithful rounding of the sum of floating-point numbers and the other for a result “as if” computed in K-fold precision. Faithful rounding means the computed result either is one of the immediate floating-point neighbors of the exact result or is equal to the exact sum if this is a floating-point number. The algorithms are based on our previous algorithms AccSum and PrecSum and improve them by up to 25%. The first algorithm adapts to the condition number of the sum; i.e., the computing time is proportional to the difficulty of the problem. The second algorithm does not need extra memory, and the computing time depends only on the number of summands and K. Both algorithms are the fastest known in terms of flops. They allow good instruction-level parallelism so that they are also fast in terms of measured computing time. The algorithms require only standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Key words. maximally accurate summation, faithful rounding

Beliavsky commented 3 years ago

There was a 2012 discussion in the gfortran mailing list accuracy of intrinsic SUM function that suggests using the pairwise summation method. The linked Wikipedia article has pseudocode ripe for Fortran implementation and says

Pairwise summation is the default summation algorithm in NumPy[8] and the Julia technical-computing language,[9] where in both cases it was found to have comparable speed to naive summation (thanks to the use of a large base case).

arjenmarkus commented 3 years ago

Intriguing :). It should not be very difficult to implement these algorithms and measure the performance. SUch an endeavour could be reported in blog form on the Fortran-lang site.

What I like about the sum intrinsic is that you can also pass a mask or have the summation occurring over a single dimension instead of all. Implementing all those variations is of course a bit beyond the current discussion, but it does not hurt to muse about it.

Op di 27 apr. 2021 om 06:57 schreef Beliavsky @.***>:

There was a 2012 discussion in the gfortran mailing list accuracy of intrinsic SUM function http://gcc.1065356.n8.nabble.com/accuracy-of-intrinsic-SUM-function-td863496.html that suggests using the pairwise summation https://en.wikipedia.org/wiki/Pairwise_summation method. The linked Wikipedia article has pseudocode ripe for Fortran implementation and says

Pairwise summation is the default summation algorithm in NumPy[8] and the Julia technical-computing language,[9] where in both cases it was found to have comparable speed to naive summation (thanks to the use of a large base case).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-827312311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7GR46Q4GIXDSDOBBLTKY73TANCNFSM43TGYCEA .

Beliavsky commented 3 years ago

A paper A Class of Fast and Accurate Summation Algorithms discussed here has an associated Matlab package FABsum that could be translated to Fortran.

arjenmarkus commented 3 years ago

Thanks - I will have a close look at this!

Op di 27 apr. 2021 om 15:42 schreef Beliavsky @.***>:

A paper A Class of Fast and Accurate Summation Algorithms http://eprints.maths.manchester.ac.uk/2748/ discussed here https://nla-group.org/2019/04/ has an associated Matlab package FABsum https://gitlab.com/nla-grp/FABsum that could be translated to Fortran.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-827615929, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7ZF76A2OHTPIEVHQ3TK25KTANCNFSM43TGYCEA .

Beliavsky commented 3 years ago

Relevant 2016 thread from comp.lang.fortran: Shouldn't intrinsic SUM avoid overflow?. Below is a code from that thread:

! from comp.lang.fortran https://groups.google.com/g/comp.lang.fortran/c/zmqXee93SN8 Shouldn't intrinsic SUM avoid overflow?
! program to test KahanSum
!
real*4 function KahanSum (values, n)
! https://en.wikipedia.org/wiki/Kahan_summation_algorithm
integer*4 n
real*4 values(n)
!
integer*4 i
real*4 sum, c, y, t
!
c = 0
sum = 0
do i = 1,n
y = values(i) - c ! So far, so good : c is zero
t = sum + y ! Alas, sum is big, Y small, so low-order digits of y are lost
c = (t - sum) - y ! (t-sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
sum = t ! Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers !
end do ! Next time around, the low part will be added to y in a fresh attempt.
!
KahanSum = sum
end function KahanSum

real*4 function Sum8 (values, n)
integer*4 n
real*4 values(n)
!
integer*4 i
real*8 sum, y
!
sum = 0
do i = 1,n
y = values(i)
sum = sum + y
end do
!
Sum8 = sum
end function Sum8

real*4 function Sum4 (values, n)
integer*4 n
real*4 values(n)
!
integer*4 i
real*4 sum, y
!
sum = 0
do i = 1,n
y = values(i)
sum = sum + y
end do
!
Sum4 = sum
end function Sum4

Program Kahan_test
!
integer*4 n, k
real*4, allocatable :: values(:)
real*4 sum4, sum8, KahanSum, s8, k4, s4, si
!
call report_options
!
do k = 2,30
n = 2**k
allocate ( values(n) )
call random_number ( values )
s8 = sum8 ( values, n )
k4 = KahanSum ( values, n )
s4 = sum4 ( values, n )
si = sum ( values )
write (*,fmt='(i10,7es11.3)') n, si, s4, k4, s8, abs(si-s8) , abs(s4-s8), abs(k4-s8)
deallocate (values)
end do
!
end

subroutine report_options
use ISO_FORTRAN_ENV
write (*,*) 'Compiler: ',compiler_version ()
write (*,*) 'Options : ',compiler_options ()
end subroutine report_options
arjenmarkus commented 3 years ago

I wrote a small program to compare the results of four algorithms both in terms of accuracy and timings (algorithms: intrinsic sum, naïve do-loop, Kahan compensated summation and pairwise summation). I ran it with two compilers, gfortran 10.2 and Intel Fortran oneAPI, without and with optimisation. Instead of random numbers I used identical, alternating and increasing numbers, so that I could easily calculate the exact sum.

The results are comparable and not comparable. For the timers I need to do a bit more work, because especially with the optimised versions the elapsed times are essentially zero.

One (preliminary) conclusion though: with optimisation (-O2) the Kahan algorithm seems no more expensive than any of the others and is quite accurate. For Intel Fortran oneAPI, however, I did have to use an option to protect the parentheses, other the Kahan algorithm produced nonsense.

I have attached said program. (I only hope I did not make a mistake 😊)

Op wo 28 apr. 2021 om 02:26 schreef Beliavsky @.***>:

Relevant 2016 thread from comp.lang.fortran: Shouldn't intrinsic SUM avoid overflow? https://groups.google.com/g/comp.lang.fortran/c/zmqXee93SN8. Below is a code from that thread:

! from comp.lang.fortran https://groups.google.com/g/comp.lang.fortran/c/zmqXee93SN8 Shouldn't intrinsic SUM avoid overflow? ! program to test KahanSum ! real4 function KahanSum (values, n) ! https://en.wikipedia.org/wiki/Kahan_summation_algorithm integer4 https://en.wikipedia.org/wiki/Kahan_summation_algorithminteger*4 n real4 values(n) ! integer4 i real*4 sum, c, y, t ! c = 0 sum = 0 do i = 1,n y = values(i) - c ! So far, so good : c is zero t = sum + y ! Alas, sum is big, Y small, so low-order digits of y are lost c = (t - sum) - y ! (t-sum) cancels the high-order part of y; subtracting y recovers negative (low part of y) sum = t ! Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers ! end do ! Next time around, the low part will be added to y in a fresh attempt. ! KahanSum = sum end function KahanSum

real4 function Sum8 (values, n) integer4 n real4 values(n) ! integer4 i real*8 sum, y ! sum = 0 do i = 1,n y = values(i) sum = sum + y end do ! Sum8 = sum end function Sum8

real4 function Sum4 (values, n) integer4 n real4 values(n) ! integer4 i real*4 sum, y ! sum = 0 do i = 1,n y = values(i) sum = sum + y end do ! Sum4 = sum end function Sum4

Program Kahan_test ! integer4 n, k real4, allocatable :: values(:) real*4 sum4, sum8, KahanSum, s8, k4, s4, si ! call report_options ! do k = 2,30 n = 2*k allocate ( values(n) ) call random_number ( values ) s8 = sum8 ( values, n ) k4 = KahanSum ( values, n ) s4 = sum4 ( values, n ) si = sum ( values ) write (,fmt='(i10,7es11.3)') n, si, s4, k4, s8, abs(si-s8) , abs(s4-s8), abs(k4-s8) deallocate (values) end do ! end

subroutine report_options use ISO_FORTRAN_ENV write (,) 'Compiler: ',compiler_version () write (,) 'Options : ',compiler_options () end subroutine report_options

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/402#issuecomment-828049277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZQ4SOME2YQMD3EE4LTK5I3TANCNFSM43TGYCEA .

! sum_accuracy.f90 -- ! Examine the accuracy of various algorithms to calculate the sum ! of an array of real values ! ! Rather than using random numbers, this program uses identical, ! linearly increasing and alternating numbers, so that the ! exact sum is known. ! ! TODO: ! - Examine the influence of the "cut-off" parameter in sum_pair ! - Implement the FABsum algorithm ! ! Note: ! Intel Fortran oneAPI requires -Qprotect-parens to get useful results ! with "Kahan" ! program sum_accuracy implicit none

integer, parameter                :: wp = kind(1.0)
integer, parameter                :: dp = kind(1.0d0)
integer, parameter                :: long = selected_int_kind(12)

integer, parameter                :: nvalues = 10000000
real(kind=wp), dimension(nvalues) :: x
real(kind=wp), dimension(12)      :: dx = [  0.1_wp,  0.2_wp,  0.3_wp,  -0.1_wp,  -0.2_wp,  -0.3_wp, &
                                            0.01_wp, 0.02_wp, 0.03_wp, -0.01_wp, -0.02_wp, -0.03_wp]
integer, dimension(5)             :: array_size = [1000, 10000, 100000, 1000000, nvalues]
integer                           :: i, j, sz
real(kind=wp)                     :: exact, result, sgn
integer(kind=long)                :: count_start, count_end, rate
real(kind=kind(1.0d0))            :: elapsed_time

!
! Sum identical values
!
write(*,'(a)') 'Identical values:'
do i = 1,size(dx)
    x = dx(i)

    write(*,'(a,g15.6)') '    Value:', dx(i)
    !
    ! Exact sum: nvalues * value
    !
    exact = nvalues * dx(i)

    call sum_intrinsic( x, result ); call report_result( result, exact, 'Sum intrinsic function' )
    call sum_naive    ( x, result ); call report_result( result, exact, 'Sum via do-loop' )
    call sum_kahan    ( x, result ); call report_result( result, exact, 'Sum via Kahan compensated algorithm' )
    call sum_pairwise ( x, result ); call report_result( result, exact, 'Sum via pairwise summation' )
enddo

!
! Sum alternating values
!
write(*,'(a)') 'Alternating values:'
do i = 1,size(dx)
    sgn = 1.0
    do j = 1,size(x)
        x(j) =  sgn * dx(i)
        sgn  = -sgn
    enddo

    write(*,'(a,g15.6)') '    Value:', dx(i)
    !
    ! Exact sum: nvalues * value
    !
    exact = 0.0_wp

    call sum_intrinsic( x, result ); call report_result( result, exact, 'Sum intrinsic function' )
    call sum_naive    ( x, result ); call report_result( result, exact, 'Sum via do-loop' )
    call sum_kahan    ( x, result ); call report_result( result, exact, 'Sum via Kahan compensated algorithm' )
    call sum_pairwise ( x, result ); call report_result( result, exact, 'Sum via pairwise summation' )
enddo

!
! Sum increasing values
!
write(*,'(a)') 'Increasing values:'
do i = 1,size(dx)
    do j = 1,size(x)
        x(j) =  j * dx(i) / 1000.0
    enddo

    write(*,'(a,g15.6)') '    Value:', dx(i) / 1000.0
    !
    ! Exact sum: nvalues * value
    !
    exact = 0.5 * size(x) * real(size(x)+1,wp) * dx(i) / 1000.0

    call sum_intrinsic( x, result ); call report_result( result, exact, 'Sum intrinsic function' )
    call sum_naive    ( x, result ); call report_result( result, exact, 'Sum via do-loop' )
    call sum_kahan    ( x, result ); call report_result( result, exact, 'Sum via Kahan compensated algorithm' )
    call sum_pairwise ( x, result ); call report_result( result, exact, 'Sum via pairwise summation' )
enddo

!
! Timings
!
write(*,'(a)') 'Timings:'
x = dx(1)

call system_clock( count_rate = rate )
do i = 1,size(array_size)
    sz = array_size(i)

    write(*,'(a,i12)') '    Number of data:', array_size(i)
    !
    ! Exact sum: nvalues * value
    !
    exact = sz * dx(1)

    call system_clock( count_start )
    call sum_intrinsic( x(1:sz), result )
    call system_clock( count_end )
    elapsed_time = (count_end - count_start) / real(rate,dp)
    call report_time( elapsed_time, 'Sum intrinsic function' )

    call system_clock( count_start )
    call sum_naive    ( x(1:sz), result )
    call system_clock( count_end )
    elapsed_time = (count_end - count_start) / real(rate,dp)
    call report_time( elapsed_time, 'Sum via do-loop' )

    call system_clock( count_start )
    call sum_kahan    ( x(1:sz), result )
    call system_clock( count_end )
    elapsed_time = (count_end - count_start) / real(rate,dp)
    call report_time( elapsed_time, 'Sum via Kahan compensated algorithm' )

    call system_clock( count_start )
    call sum_pairwise ( x(1:sz), result )
    call system_clock( count_end )
    elapsed_time = (count_end - count_start) / real(rate,dp)
    call report_time( elapsed_time, 'Sum via pairwise summation' )
enddo

contains

! report_result -- ! Report the result and compare to the exact value ! ! Arguments: ! result Computed sum ! exact Exact value ! text Text to accompany the result ! subroutine report_result( result, exact, text ) real(kind=wp), intent(in) :: result real(kind=wp), intent(in) :: exact character(len=*), intent(in) :: text

write(*,'(4x,3g15.6,5x,a)') result, exact, result-exact, text

end subroutine report_result

! report_time -- ! Report the time it took to arrive at the result ! ! Arguments: ! time Time in seconds ! text Text to accompany the result ! subroutine report_time( time, text ) real(kind=dp), intent(in) :: time character(len=*), intent(in) :: text

write(*,'(4x,g15.6,5x,a)') time, text

end subroutine report_time

! sum_intrinsic -- ! Use the sum intrinsic function to sum the data ! ! Arguments: ! x Array of values ! result Calculated sum ! subroutine sum_intrinsic( x, result ) real(kind=wp), dimension(:), intent(in) :: x real(kind=wp), intent(out) :: result

result = sum(x)

end subroutine sum_intrinsic

! sum_naive -- ! Use an explicit do-loop to sum the data ! ! Arguments: ! x Array of values ! result Calculated sum ! subroutine sum_naive( x, result ) real(kind=wp), dimension(:), intent(in) :: x real(kind=wp), intent(out) :: result

integer                                 :: i

result = 0.0_wp
do i = 1,size(x)
    result = result + x(i)
enddo

end subroutine sum_naive

! sum_kahan -- ! Use an explicit do-loop with compensation according to Kahan to sum the data ! ! Arguments: ! x Array of values ! result Calculated sum ! subroutine sum_kahan( x, result ) real(kind=wp), dimension(:), intent(in) :: x real(kind=wp), intent(out) :: result

real(kind=wp)                           :: c, t, y
integer                                 :: i

result = x(1)
c      = 0.0_wp
do i = 2,size(x)
    y      = x(i)   - c
    t      = result + y
    c      = (t - result) - y
    result = t
enddo

end subroutine sum_kahan

! sum_pairwise -- ! Use pairwise summation to sum the data ! ! Arguments: ! x Array of values ! result Calculated sum ! subroutine sum_pairwise( x, result ) real(kind=wp), dimension(:), intent(in) :: x real(kind=wp), intent(out) :: result

real(kind=wp)                           :: c, t, y
integer                                 :: i

result = sum_pair( x )

end subroutine sum_pairwise

! sum_pair -- ! Function to sum the data via the pairwise algorithm ! ! Arguments: ! x Array of values ! recursive function sum_pair( x ) result(sump) real(kind=wp), dimension(:), intent(in) :: x real(kind=wp) :: sump

integer                                 :: n

if ( size(x) <= 16 ) then
    sump = sum(x)
else
    n    = size(x) / 2
    sump = sum_pair(x(1:n)) + sum_pair(x(n+1:))
endif

end function sum_pair

end program sum_accuracy

certik commented 3 years ago

Well, last time I played with that, -ffast-math makes the Kahan summation not to work. I am not sure what is going on above, but the Kahan summation should be "optimized" out, so I would expect this to be a bug in the compiler if it can't optimize it out. In this particular case, you actually do not want it to optimize it out, so we would have to enforce it somehow.

Finally, I've never needed to use Kahan in practice, as it has always been possible to rearrange a large sum so that one can sum in any order, which is especially important if you want to use do concurrent or co-arrays / MPI. Even the BLAS sum (dot_product) does not guarantee order of operations.

There are other summation algorithm, such as pairwise summation.

Julia has several of these implemented, so we should too in stdlib. It can be helpful for testing. But as I said, for production code I recommend to rework the algorithm to be order independent, then you can use -ffast-math and parallelize without worry. If anyone has any questions how to do that in some particular case, I am happy to help.