Open SebWouters opened 6 months ago
Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.
In the double precision code, RGAMSQ
would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.
Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.
Does Fortran have ldexp(base, exp)
which computes base · 2^exp
? One could replace expressions X * GAM**2
with ldexp(x, 24)
and X / GAM**2
with ldexp(x, -24)
. This would keep the code readable and possibly enable legitimate compiler optimizations.
edit ~Fortran77 has EXP(x)
.~ /edit
edit2 Fortran supports exponentiation in its syntax (try 2.0D0**(-24)
). /edit2
"Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ." => In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.
Yes, of course, you are correct. This is a good point. Thanks. The 5.9604645D-8 is not a good approximation! In base 10, RGAMSQ is 5.9604644775390625D-08, I think that this base 10 representation will be mapped exactly to RGAMSQ (which is 2^(-24)) in the computer in double precision arithmetic. I did not check.
xROTMG was introduced in 1979 in ACM Algorithm 539. The linked page contains a freely available tarball with the Fortran77 source code (open Original/539.txt
). In my opinion the code should either be kept in its current form to make its origins crystal clear or the code should be simplified as described in the next paragraph.
Fortran supports exponentiation in its syntax. My suggestion is to use this notation and to let the compiler take care of optimizing expressions like x / GAM
, x * GAM**2
or x <= GAM**(-2)
(i.e., GAMSQ
and RGAMSQ
are removed). I assume the variables GAMSQ
and RGAMSQ
were introduced to avoid deficiencies in compiler technologies 45 years ago. Moreover, a PARAMETER
statement should be used because a missing variable initialization will cause compiler errors:
REAL, PARAMETER :: GAM = 2.0E0**12
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12
edit fixed the right-hand sides in the Fortran code above. Thanks, Julien. /edit
The longer that I look at this function, the longer I think it should be left as is: the change would not significantly improve performance, there are no callers of this function in LAPACK, and in general nobody seems to be using it.
The constant 4096 exists to avoid under- and overflows and the same value is used for single and double precision. Picking a larger value would avoid unnecessary re-scaling operations. Another optimization opportunity is the possibly dead code branch starting in line 153 (there is a comment above the branch saying that the authors do not expect the branch to be taken) which could be removed.
Finally, there exist clones of this implementation, e.g., a C clone created by f2c in BLIS or in golang. The commit for the golang code add DROTMG is 13 years old and reads Drotmg added (not tested!)
.
See also https://github.com/Reference-LAPACK/lapack/issues/244, where a paper with a clearer implementation is referenced and a bug related to scaling is mentioned
I looked into this this morning. Here is a brain dump. I now see that @angsch added a comment. Thanks. I am copy-pasting my brain dump. I'll try to work more on this later in a few days. If anyone wants to make progress, please do.
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12
DOUBLE PRECISION, PARAMETER :: GAMSQ = 2.0D0**24
DOUBLE PRECISION, PARAMETER :: RGAMSQ = 2.0D0**(-24)
I'll have to read more the gonum thread.
Related. I was reading the code and the WHILE loop looked suspicious. I think you can get an infinite loop if you call the function with DD1 is +Infinity for example. See code below. I did not spend much time on it. Just looked suspicious and the code below seems to have an infinite loop.
program test
use iso_fortran_env, only: sp => real32, dp => real64
IMPLICIT NONE
DOUBLE PRECISION DPARAM(5)
DOUBLE PRECISION DD1, DD2, DX1, DY1
real(dp) :: pinf_dp = real(z'7ff0000000000000',dp)
DD1 = pinf_dp
DD2 = 1.0D+00
DX1 = 1.0D+00
DY1 = 1.0D+00
CALL DROTMG(DD1,DD2,DX1,DY1,DPARAM)
print *,DD1
end program
Hi Sebastian,
I am looking more into this. I only had time for a cursory review at this point.
I am reading in the comments of the routine that
But I think in "double," so in drotmg, (what you did,) it might be OK to use GAMSQ. (Because GAMSQ is exactly GAM**2.) Whereas in "single," so in srotmg, it might not be OK to use GAMSQ. In the "single" precision case, according to the comment, using GAM twice would be better. (But then why DD1*GAM**2 and why not (DD1*GAM)*GAM.)
Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.
I'll have to look more into this. I just wanted to point to this comment in the subroutine.
(Also there is a typo in the comment on line 153 of xrotmg that we might want to correct "This code path if [sic] here for safety." The "if" should be an "is".)
Julien