drotmg.f: Replace GAM**2 by existing GAMSQ

langou commented 6 months ago

Hi Sebastian,

I am looking more into this. I only had time for a cursory review at this point.

I am reading in the comments of the routine that

THE VALUES OF GAMSQ AND RGAMSQ SET IN THE DATA STATEMENT MAY BE INEXACT. THIS IS OK AS THEY ARE ONLY USED FOR TESTING THE SIZE OF DD1 AND DD2. ALL ACTUAL SCALING OF DATA IS DONE USING GAM.

But I think in "double," so in drotmg, (what you did,) it might be OK to use GAMSQ. (Because GAMSQ is exactly GAM**2.) Whereas in "single," so in srotmg, it might not be OK to use GAMSQ. In the "single" precision case, according to the comment, using GAM twice would be better. (But then why DD1*GAM**2 and why not (DD1*GAM)*GAM.)

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

I'll have to look more into this. I just wanted to point to this comment in the subroutine.

(Also there is a typo in the comment on line 153 of xrotmg that we might want to correct "This code path if [sic] here for safety." The "if" should be an "is".)

Julien

christoph-conrads commented 6 months ago

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

christoph-conrads commented 6 months ago

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

Does Fortran have ldexp(base, exp) which computes base · 2^exp? One could replace expressions X * GAM**2 with ldexp(x, 24) and X / GAM**2 with ldexp(x, -24). This would keep the code readable and possibly enable legitimate compiler optimizations.

edit ~Fortran77 has EXP(x).~ /edit

edit2 Fortran supports exponentiation in its syntax (try 2.0D0**(-24)). /edit2

langou commented 6 months ago

"Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ." => In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

Yes, of course, you are correct. This is a good point. Thanks. The 5.9604645D-8 is not a good approximation! In base 10, RGAMSQ is 5.9604644775390625D-08, I think that this base 10 representation will be mapped exactly to RGAMSQ (which is 2^(-24)) in the computer in double precision arithmetic. I did not check.

christoph-conrads commented 6 months ago

xROTMG was introduced in 1979 in ACM Algorithm 539. The linked page contains a freely available tarball with the Fortran77 source code (open Original/539.txt). In my opinion the code should either be kept in its current form to make its origins crystal clear or the code should be simplified as described in the next paragraph.

Fortran supports exponentiation in its syntax. My suggestion is to use this notation and to let the compiler take care of optimizing expressions like x / GAM, x * GAM**2 or x <= GAM**(-2) (i.e., GAMSQ and RGAMSQ are removed). I assume the variables GAMSQ and RGAMSQ were introduced to avoid deficiencies in compiler technologies 45 years ago. Moreover, a PARAMETER statement should be used because a missing variable initialization will cause compiler errors:

REAL, PARAMETER :: GAM = 2.0E0**12
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12

edit fixed the right-hand sides in the Fortran code above. Thanks, Julien. /edit

christoph-conrads commented 6 months ago

The longer that I look at this function, the longer I think it should be left as is: the change would not significantly improve performance, there are no callers of this function in LAPACK, and in general nobody seems to be using it.

The constant 4096 exists to avoid under- and overflows and the same value is used for single and double precision. Picking a larger value would avoid unnecessary re-scaling operations. Another optimization opportunity is the possibly dead code branch starting in line 153 (there is a comment above the branch saying that the authors do not expect the branch to be taken) which could be removed.

Finally, there exist clones of this implementation, e.g., a C clone created by f2c in BLIS or in golang. The commit for the golang code add DROTMG is 13 years old and reads Drotmg added (not tested!).

angsch commented 6 months ago

See also https://github.com/Reference-LAPACK/lapack/issues/244, where a paper with a clearer implementation is referenced and a bug related to scaling is mentioned

langou commented 6 months ago

I looked into this this morning. Here is a brain dump. I now see that @angsch added a comment. Thanks. I am copy-pasting my brain dump. I'll try to work more on this later in a few days. If anyone wants to make progress, please do.

The fact that we initialize RGAMSQ which is 2**(-24) to the approximate value of 5.9604645D-8 in double precision bothers me.
This seems to be related to the fix in gonum pointed by @angsch. (Thanks!)

It could be better to change with

DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12
DOUBLE PRECISION, PARAMETER :: GAMSQ = 2.0D0**24
DOUBLE PRECISION, PARAMETER :: RGAMSQ = 2.0D0**(-24)

I understand that RGAMSQ and GAMSQ are not used in computation, they are used as threshold but still. I understand that the goal was to have the same code base for SINGLE and DOUBLE, but why not write the exact value ( 5.9604644775390625E-008 ) in both codes and let the compiler figure it out. Oh well.
Yes, we might want to replace all the DATA statements by PARAMETER statements
Note that there are also some DATA statements in ROTM
Note that GAM, GAMSQ, RGAMSQ are declared on line 106 of the current code. Then the DATA statement appears.
I think it would be better to replace DD1 = DD1*GAM*2 by DD1 = (DD1*GAM)\GAM, but if GAMSQ
RGAMSQ is inexact (as the current code is done), but GAMSQ is a floating-point number (so is exact). At least this is true in SINGLE (32-bit) and DOUBLE (64-bit). This would even be true in 16-bit floating-point arithmetic, GAMSQ should be a floating-point number. Using GAMSQ with an 8-bit floating-point arithmetic code is not going to cut it though!
I think that the statement "the values of GAMSQ and RGAMSQ set in the data statement may be inexact" might refer to non-IEEE arithmetics that were around in the seventies.

I'll have to read more the gonum thread.

langou commented 6 months ago

Related. I was reading the code and the WHILE loop looked suspicious. I think you can get an infinite loop if you call the function with DD1 is +Infinity for example. See code below. I did not spend much time on it. Just looked suspicious and the code below seems to have an infinite loop.

      program test
      use iso_fortran_env, only: sp => real32, dp => real64
      IMPLICIT NONE
      DOUBLE PRECISION DPARAM(5)
      DOUBLE PRECISION DD1, DD2, DX1, DY1
      real(dp) :: pinf_dp = real(z'7ff0000000000000',dp)

      DD1 = pinf_dp
      DD2 = 1.0D+00
      DX1 = 1.0D+00
      DY1 = 1.0D+00
      CALL DROTMG(DD1,DD2,DX1,DY1,DPARAM)
      print *,DD1
      end program

Reference-LAPACK / lapack

drotmg.f: Replace GAM**2 by existing GAMSQ #1000