Reference-LAPACK / lapack

LAPACK development repository
Other
1.49k stars 436 forks source link

drotmg.f: Replace GAM**2 by existing GAMSQ #1000

Open SebWouters opened 6 months ago

langou commented 6 months ago

Hi Sebastian,

I am looking more into this. I only had time for a cursory review at this point.

I am reading in the comments of the routine that

THE VALUES OF GAMSQ AND RGAMSQ SET IN THE DATA STATEMENT MAY BE INEXACT. THIS IS OK AS THEY ARE ONLY USED FOR TESTING THE SIZE OF DD1 AND DD2. ALL ACTUAL SCALING OF DATA IS DONE USING GAM.

But I think in "double," so in drotmg, (what you did,) it might be OK to use GAMSQ. (Because GAMSQ is exactly GAM**2.) Whereas in "single," so in srotmg, it might not be OK to use GAMSQ. In the "single" precision case, according to the comment, using GAM twice would be better. (But then why DD1*GAM**2 and why not (DD1*GAM)*GAM.)

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

I'll have to look more into this. I just wanted to point to this comment in the subroutine.

(Also there is a typo in the comment on line 153 of xrotmg that we might want to correct "This code path if [sic] here for safety." The "if" should be an "is".)

Julien

christoph-conrads commented 6 months ago

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

christoph-conrads commented 6 months ago

Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ.

Does Fortran have ldexp(base, exp) which computes base · 2^exp? One could replace expressions X * GAM**2 with ldexp(x, 24) and X / GAM**2 with ldexp(x, -24). This would keep the code readable and possibly enable legitimate compiler optimizations.

edit ~Fortran77 has EXP(x).~ /edit

edit2 Fortran supports exponentiation in its syntax (try 2.0D0**(-24)). /edit2

langou commented 6 months ago

"Also, for the DD1 = DD1/GAM**2, instead of dividing by GAMSQ, maybe we should multiply by RGAMSQ." => In the double precision code, RGAMSQ would have to be updated with an accurate value. At the moment a single-precision value with 7 decimal digits is used.

Yes, of course, you are correct. This is a good point. Thanks. The 5.9604645D-8 is not a good approximation! In base 10, RGAMSQ is 5.9604644775390625D-08, I think that this base 10 representation will be mapped exactly to RGAMSQ (which is 2^(-24)) in the computer in double precision arithmetic. I did not check.

christoph-conrads commented 6 months ago

xROTMG was introduced in 1979 in ACM Algorithm 539. The linked page contains a freely available tarball with the Fortran77 source code (open Original/539.txt). In my opinion the code should either be kept in its current form to make its origins crystal clear or the code should be simplified as described in the next paragraph.

Fortran supports exponentiation in its syntax. My suggestion is to use this notation and to let the compiler take care of optimizing expressions like x / GAM, x * GAM**2 or x <= GAM**(-2) (i.e., GAMSQ and RGAMSQ are removed). I assume the variables GAMSQ and RGAMSQ were introduced to avoid deficiencies in compiler technologies 45 years ago. Moreover, a PARAMETER statement should be used because a missing variable initialization will cause compiler errors:

REAL, PARAMETER :: GAM = 2.0E0**12
DOUBLE PRECISION, PARAMETER :: GAM = 2.0D0**12

edit fixed the right-hand sides in the Fortran code above. Thanks, Julien. /edit

christoph-conrads commented 6 months ago

The longer that I look at this function, the longer I think it should be left as is: the change would not significantly improve performance, there are no callers of this function in LAPACK, and in general nobody seems to be using it.

The constant 4096 exists to avoid under- and overflows and the same value is used for single and double precision. Picking a larger value would avoid unnecessary re-scaling operations. Another optimization opportunity is the possibly dead code branch starting in line 153 (there is a comment above the branch saying that the authors do not expect the branch to be taken) which could be removed.

Finally, there exist clones of this implementation, e.g., a C clone created by f2c in BLIS or in golang. The commit for the golang code add DROTMG is 13 years old and reads Drotmg added (not tested!).

angsch commented 6 months ago

See also https://github.com/Reference-LAPACK/lapack/issues/244, where a paper with a clearer implementation is referenced and a bug related to scaling is mentioned

langou commented 6 months ago

I looked into this this morning. Here is a brain dump. I now see that @angsch added a comment. Thanks. I am copy-pasting my brain dump. I'll try to work more on this later in a few days. If anyone wants to make progress, please do.

I'll have to read more the gonum thread.

langou commented 6 months ago

Related. I was reading the code and the WHILE loop looked suspicious. I think you can get an infinite loop if you call the function with DD1 is +Infinity for example. See code below. I did not spend much time on it. Just looked suspicious and the code below seems to have an infinite loop.

      program test
      use iso_fortran_env, only: sp => real32, dp => real64
      IMPLICIT NONE
      DOUBLE PRECISION DPARAM(5)
      DOUBLE PRECISION DD1, DD2, DX1, DY1
      real(dp) :: pinf_dp = real(z'7ff0000000000000',dp)

      DD1 = pinf_dp
      DD2 = 1.0D+00
      DX1 = 1.0D+00
      DY1 = 1.0D+00
      CALL DROTMG(DD1,DD2,DX1,DY1,DPARAM)
      print *,DD1
      end program