Closed hokb closed 1 year ago
The underflow itself is not the true problem. After underflow, the algorithm switches to CABS1, which is less prone to underflow. The problem that creates is that TEMP will not be exactly unitary, leading to roundoff in Z.
A possible solution is to prescale using CABS1 and then correct using ABS (because of the first scaling, ABS should no longer overflow). (I don't get the underflow on my machine, so i can't test it for you)
IF (RTEMP .EQ. RZERO) THEN
RTEMP = CABS1(TEMP)
H( I, I-1 ) = RTEMP
TEMP = TEMP / RTEMP
RTEMP = ABS( TEMP)
H( I, I-1 ) = H( I, I-1 )*RTEMP
TEMP = TEMP / RTEMP
END IF
I think the tests are definitely designed to succeed for the set of popular FORTRAN compilers, because that is simply how they are run. Predicting under/overflow is incredibly hard. At least in my case, these subroutines are designed by simply testing them (using the popular compilers) thoroughly and fixing any over/underflow we find.
Thank you! This is very helpful. We had tried to recover from underflow using CABS1. But our attempt was not going far enough. Your suggestion seems to do much better. Using ...
*
* Ensure that H(I,I-1) is real.
*
TEMP = H( I, I-1 )
IF( DIMAG( TEMP ).NE.RZERO ) THEN
RTEMP = ABS( TEMP)
IF (RTEMP .EQ. RZERO) THEN
RTEMP = CABS1(TEMP)
H( I, I-1 ) = RTEMP
TEMP = TEMP / RTEMP
RTEMP = ABS( TEMP)
H( I, I-1 ) = H( I, I-1 )*RTEMP
ELSE
H( I, I-1 ) = RTEMP
END IF
TEMP = TEMP / RTEMP
IF( I2.GT.I )
$ CALL ZSCAL( I2-I, DCONJG( TEMP ), H( I, I+1 ), LDH )
CALL ZSCAL( I-I1, TEMP, H( I1, I ), 1 )
IF( WANTZ ) THEN
CALL ZSCAL( NZ, TEMP, Z( ILOZ, I ), 1 )
END IF
END IF
*
130 CONTINUE
... this iteration completes successfully (even when using SSE registers for ABS()).
I think the tests are definitely designed to succeed for the set of popular FORTRAN compilers, because that is simply how they are run. Predicting under/overflow is incredibly hard. At least in my case, these subroutines are designed by simply testing them (using the popular compilers) thoroughly and fixing any over/underflow we find.
The tests suite is of tremendous help! My rough estimation is that far less than 1% of the tests are affected by this or similar overflow issues (when using our compiler). Making the tests even more robust against under-/overflow could help to bring LAPACK to more platforms. Our (failed) attempt above is just one example, which clearly shows, that we would hardly be able to come up with a fix on our side, though. Before opening multiple related issues I would like to start a discussion, whether or not there is interest in such journey and what would be a good approach.
Thanks for the improvement @hokb and @thijssteel! Should I write a PR with the modifications or are you willing to do that, @hokb?
Given my limited experience with the project I would appreciate your effort and the chance to take your PR as a guideline for pot. future PRs from us... (if that's ok?)
Hi @hokb,
I looked for some specification but couldn't find it yet. Any link would also be appreciated. Thanks in advance!
I am not sure anything is specified anywhere.
Our compiler targets the .NET CLR. Its JIT decides to use SSE registers for ABS(TEMP), which leads to the underflow in the intermediate calculation of the magnitude. Ifort (as another example) uses floating point registers in this situation, hence does not underflow (because of its larger length: 80 bits). I am trying to get a clear(er) picture of what to expect from LAPACK regarding which precision / number range it requires from the compiler / processor at runtime.
Bold statement: If all computations are done using IEEE 64-bit arithmetic, then LAPACK should work.
LAPACK does not expect 80-bit register to come help its computation at any times. The algorithms are designed with 64-bit arithmetic in mind. Now, as mentioned by @thijssteel, LAPACK is tested with various compilers/architectures, and these compilers/architectures use 80-bit registers at times, and we might think our algorithms only need 64-bit all along, but they do not, and they, in effect, do require an 80-bit.
We have not done anything systematic in our journey to go after these issues. In general, we are happy enough when the algorithms pass the test suite, and, if there is some help from 80-bit register, so be it.
Are all tests for double precision designed to require 64 bit registers at least ? Or are they designed in a way to succeed for the set of popular FORTRAN compilers available today? (In the first case above issue (and similar others) may require attention. Should I file an issue for them?)
My rough estimation is that far less than 1% of the tests are affected by this or similar overflow issues (when using our compiler).
Oh my. 1%? That is a scary large number.
The tests are testing a lot around underflow and overflow region so it could be expected that the tests are much more likely in term of triggering this issue than users' code but still.
Making the tests even more robust against under-/overflow could help to bring LAPACK to more platforms.
Portability to more platforms is one interest indeed. Another interest is extended precision with package such as GMP where, as I understand, the precision is fixed throughout the computation. (So for example you are 256-bit thought, and there is not a 300-bit register to come help you.)
Our (failed) attempt above is just one example, which clearly shows, that we would hardly be able to come up with a fix on our side, though. Before opening multiple related issues I would like to start a discussion, whether or not there is interest in such journey and what would be a good approach.
Yes. We are interested. We can only do so much though. And we have a lot on our plates. So maybe we take this one issue at a time, and we see how far we go.
In any case, posting issues on the GitHub is always a good idea. It gives awareness to the problem, and it helps gathering help, ideas to fix the issues.
I am happy we go down this path, but I would recommend to take it easy.
Maybe, for gfortran, we should compile with the flags -mfpmath=sse -msse2
for testing purpose. I think this will force all computation to be done with 64-bit arithmetic. I am not sure though.
Given my limited experience with the project I would appreciate your effort and the chance to take your PR as a guideline for pot. future PRs from us... (if that's ok?)
Sure! Please, see #577.
@weslleyspereira Awesome! I am still checking, if this applies to CLAHQR the same. Will post my result ASAP (tomorrow)
Hello @langou !
Bold statement: If all computations are done using IEEE 64-bit arithmetic, then LAPACK should work.
Nice! I suppose, by 'work' we mean: when fed with data 'in a certain range' it will not overflow due to a given register size?
LAPACK does not expect 80-bit register to come help its computation at any times. The algorithms are designed with 64-bit arithmetic in mind. Now, as mentioned by @thijssteel, LAPACK is tested with various compilers/architectures, and these compilers/architectures use 80-bit registers at times, and we might think our algorithms only need 64-bit all along, but they do not, and they, in effect, do require an 80-bit.
We have not done anything systematic in our journey to go after these issues. In general, we are happy enough when the algorithms pass the test suite, and, if there is some help from 80-bit register, so be it.
Sounds very reasonable!
My rough estimation is that far less than 1% of the tests are affected by this or similar overflow issues (when using our compiler).
Oh my. 1%? That is a scary large number.
Well, likely it is 'far less' than that ;)
Portability to more platforms is one interest indeed. Another interest is extended precision with package such as GMP where, as I understand, the precision is fixed throughout the computation. (So for example you are 256-bit thought, and there is not a 300-bit register to come help you.)
Sounds interesting, but I cannot comment on this, since I lack of experience with such fixed precision attempts.
Yes. We are interested. We can only do so much though. And we have a lot on our plates. So maybe we take this one issue at a time, and we see how far we go.
I am still unsure what a good general approach would be. Bare with me, if my understanding is too naive. But isn't over-/underflow always depending on both: input data and algorithm? So instead of flooding the code with new conditionals testing for and new code for recovering from them we might instead decrease the 'allowed range' for the input data? I don't have the necessary insight into the effort required for either approach, though. So I cannot judge what would be more feasible.
In any case, posting issues on the GitHub is always a good idea. It gives awareness to the problem, and it helps gathering help, ideas to fix the issues.
Good. We will file issues as we go. I understand that it will be a challenge to come up with a fix without being able to reproduce an underflow. So, what information can we provide to make the issue more clear? Does the path down to the concrete underflow help? I.e.: providing iteration counts, current values of locals together with file names etc.?
I am happy we go down this path, but I would recommend to take it easy.
Same thing here! :)
One outcome of #577 is that LAPACK relies on the FORTRAN compiler to implement reasonably robust (under- / overflow) complex division and ABS(). I wonder if we should start maintaining a document, collecting such and similar requirements? They will be equally important and useful for anyone wanting to use LAPACK with other / new compilers, for compiler builders and in order to transfer parts or all of LAPACK algorithms to other languages?
Sure! It will be good to have this information well documented.
To begin with, I spent some time tracking (maybe) all divisions in the files LAPACK/SRC/z*.f
(COMPLEX*16 algorithms) of the form
REAL / COMPLEX or COMPLEX / COMPLEX
I found a total of 53 files. See the attached file: complexDivisionFound.code-search
For that, I used the REGEX expression in the Visual Studio Code:
\n .*/^0-9(?!DBLE)(?!REAL)(?!MIN)(?!MAX)[^0-9]
Maybe, for gfortran, we should compile with the flags
-mfpmath=sse -msse2
for testing purpose. I think this will force all computation to be done with 64-bit arithmetic. I am not sure though.
Yes, it should when using GCC but this flag should also be set by default on x86-64. The documentation excerpt below is for GCC 11 but much older GCC versions should exhibit the same behavior. Using the GNU Compiler Collection (GCC): 3.19.59 x86 Options
sse
Use scalar floating-point instructions present in the SSE instruction set. This instruction set is supported by Pentium III and newer chips, and in the AMD line by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE instruction set supports only single-precision arithmetic, thus the double and extended-precision arithmetic are still done using 387. A later version, present only in Pentium 4 and AMD x86-64 chips, supports double-precision arithmetic too.
For the x86-32 compiler, you must use
-march=cpu-type
,-msse
or-msse2
switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.
This is the default choice for the x86-64 compiler, Darwin x86-32 targets, and the default choice for x86-32 targets with the SSE2 instruction set when
-ffast-math
is enabled.
Example: XEIGTSTZ < zec.in - fails due to underflow in ZLAHQR. Steps to reproduce: ZGET37 -> knt == 31, ZHSEQR -> ZLAHQR -> at the end of the second QR step (ITS == 2) the following code causes underflow (on certain registers, see below)
TEMP = H( I, I-1 ) IF( DIMAG( TEMP ).NE.RZERO ) THEN RTEMP = ABS( TEMP) ! <-- underflow on TEMP = (~1e-0173, ~1e-0173) IF (RTEMP .EQ. RZERO) RTEMP = CABS1(TEMP) H( I, I-1 ) = RTEMP TEMP = TEMP / RTEMP IF( I2.GT.I )
Our compiler targets the .NET CLR. Its JIT decides to use SSE registers for ABS(TEMP), which leads to the underflow in the intermediate calculation of the magnitude. Ifort (as another example) uses floating point registers in this situation, hence does not underflow (because of its larger length: 80 bits). I am trying to get a clear(er) picture of what to expect from LAPACK regarding which precision / number range it requires from the compiler / processor at runtime.
To recap:
Correct?
By default GCC generates only code for 64-bit float registers on x86-64 and on my machines usually all LAPACK tests pass except for one or two.
Does the Netlib LAPACK test suite pass when compiling with GCC?
edit: resolved https://github.com/Reference-LAPACK/lapack/pull/577#issuecomment-859496175
Maybe, for gfortran, we should compile with the flags -mfpmath=sse -msse2 for testing purpose. I think this will force all computation to be done with 64-bit arithmetic. I am not sure though.
i tried -mfpmath=sse -msse2
with GCC 11 on both MacOS and Linux: https://github.com/weslleyspereira/lapack/actions/runs/966071530. There were no additional errors when compared to the workflow without those flags: https://github.com/Reference-LAPACK/lapack/actions/runs/945060330. See #591
@hokb, could you reproduce the overflow issues you mentioned in https://github.com/Reference-LAPACK/lapack/issues/575#issuecomment-855880000 with GCC using SSE flags? Can you help me with that?
@weslleyspereira I haven't even tried GCC. All I have access to / a running setup for is ifort on Windows. It would take some days for me to get GCC up and running via cygwin to test (especially from my current holiday hotel room ... :| ) Let me know if you need me to take this challenge, though ! At least and according to https://godbolt.org/z/YYv5oPxe9 using the flags does not affect the code generated by gfortran. But of course only a test run will tell for sure ...
I don't use windows, but I do have it here. I will start by testing LAPACK with ifort on my Ubuntu and see what happens. Enjoy the holiday!
I am coming back to this issue. Sorry for the delay.
@hokb, did you try to compile/test with ifort using the flag '-xSSE2' ?
Using the godbolt website, I noticed that:
I ran the LAPACK tests with ifort in my machine. All tests pass using:
thanks for the follow-up! It is on my TODO list. Will try & report ASAP (likely after Aug-21).
Some further investigation on the routines that use the intrinsic Fortran ABS operation for complex numbers:
I tracked ABS( a complex variable )
in the files LAPACK/SRC/z*.f
(COMPLEX*16 algorithms) in the same way I did for the complex divisions. I found this kind of call on the following 60 files:
SRC/zgbbrd.f SRC/zgbsvx.f SRC/zgebal.f SRC/zgejsv.f SRC/zgelsy.f SRC/zgesc2.f SRC/zgesvdq.f SRC/zgesvj.f SRC/zgetc2.f
SRC/zgetf2.f SRC/zgetrf2.f SRC/zggbal.f SRC/zggsvp3.f SRC/zgsvj0.f SRC/zgsvj1.f SRC/zhbtrd.f SRC/zhetrd_hb2st.F
SRC/zhetri_3x.f SRC/zhetri_rook.f SRC/zhetri.f SRC/zhgeqz.f SRC/zhptri.f SRC/zlacn2.f SRC/zlacon.f SRC/zlaesy.f
SRC/zlaev2.f SRC/zlags2.f SRC/zlahqr.f SRC/zlaic1.f SRC/zlangb.f SRC/zlange.f SRC/zlangt.f SRC/zlanhb.f SRC/zlanhe.f
SRC/zlanhf.f SRC/zlanhp.f SRC/zlanhs.f SRC/zlanht.f SRC/zlansb.f SRC/zlansp.f SRC/zlansy.f SRC/zlantb.f SRC/zlantp.f
SRC/zlantr.f SRC/zlapll.f SRC/zlaqp2.f SRC/zlaqps.f SRC/zlaqz0.f SRC/zlaqz2.f SRC/zlaqz3.f SRC/zlar1v.f SRC/zlarfgp.f
SRC/zlarrv.f SRC/zlatdf.f SRC/zptcon.f SRC/zptrfs.f SRC/ztgex2.f SRC/ztgsen.f SRC/ztgsna.f SRC/ztrsna.f
For that, I used two REGEX expressions in the Visual Studio Code:
\n .*ABS\( (?!DBLE)(?!REAL)(?!MIN)(?!MAX)(?!DIMAG) # a space before the parenthesis
\n .*ABS\((?! )(?!DBLE)(?!REAL)(?!MIN)(?!MAX)(?!DIMAG) # no space before the parenthesis
* Note that, potentially, we would have more 60 single-precision routines that use the intrinsic complex ABS.
thanks for the follow-up! It is on my TODO list. Will try & report ASAP (likely after Aug-21).
@weslleyspereira thanks for your patience. I tried to get back to this today. My attempt to test ifort with /xSSE2 flag was unsuccessful:
ifort: command line warning #10006: ignoring unknown option '/xSSE2'
Also, in the list of supported opotions I was not able to find any flag to force ifort to use SSE / AVX registers, always (?). Do you (or somebody reading) know which option I could try?
IMO the most promising option would be /Qaxcode. According to the docs, it (emphasize is mine) ...
_May_ generate Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel® processors.
It looks reasonable that such flag could only produce alternative code paths, so that general compatibility would not be compromised ?
Compiling the LAPACK lib and the tests with /QaxAVX yields no errors. All tests are passing, too - while running significantly slower than without /QaxAVX. I am still wondering how we could get a value out of this? The goal (I guess) would be to ensure that the tests never require more robustness than what is provided by the implementation of complex division and complex magnitude (and potentially other operations) - regardless if their implementation is straight forward, using x87 registers or if it uses more sophisticated algorithms on SSE registers. But I don't see a reliable way to make sure to have it actually prefer SSE over x87 - always... ?
Hi @hokb, I think I can reproduce some precision issues for complex data.
I tried ifort (IFORT) 2021.3.0 20210609 on my Ubuntu 18.04.5 LTS with the flags -mno-x87 -complex-limited-range
.
After running ctest it says "100% tests passed, 0 tests failed out of 103", however I do see many failing using grep 'fail' *
inside TESTING. All related to complex data.
From the ifort documentation:
-m80387 Specify whether the compiler can use x87 instructions.
Use -mno-80387 to disable.
-mx87 Same as -m80387
-[no-]complex-limited-range
enable/disable(DEFAULT) the use of the basic algebraic expansions of
some complex arithmetic operations. This can allow for some
performance improvement in programs which use a lot of complex
arithmetic at the loss of some exponent range.
I think the flag -complex-limited-range
is necessary to get the code compiled.
Can you try to reproduce the failing tests with those two flags?
So, by default, I think we should expect some compilers will use x87 instructions to improve precision.
-mno-80387
is not available on Windows.
With the flag
/Qcomplex-limited-range
my Release/x64 build looks good for non-complex types, as expected:
tests 1... 53 pass:
* xblat[1|2|3][s|d|c|z] - all pass
* xlintst and xeigtst pass on [s|d] only.
...
1> 54/102 Test #54: LAPACK-xeigtstd_gsv_in ........... Passed 0.03 sec
1> Start 55: LAPACK-xeigtstd_csd_in
1> 55/102 Test #55: LAPACK-xeigtstd_csd_in ........... Passed 0.03 sec
1> Start 56: LAPACK-xeigtstd_lse_in
1> 56/102 Test #56: LAPACK-xeigtstd_lse_in ........... Passed 0.03 sec
1> Start 57: LAPACK-xlintstc_ctest_in
1> 57/102 Test #57: LAPACK-xlintstc_ctest_in ......... Passed 3.00 sec
1> Start 58: LAPACK-xlintstrfc_ctest_rfp_in
1> 58/102 Test #58: LAPACK-xlintstrfc_ctest_rfp_in ... Passed 0.33 sec
1> Start 59: LAPACK-xeigtstc_nep_in
In xeigtstc < nep it got stuck for more than 1h with full core activity. I have tried to build with debug info but this would make all tests fail here. So, unfortunately, I have no more details about the specific place of failure.
I think the flag -complex-limited-range is necessary to get the code compiled.
it looks, the default is -no-complex-limited-range: by default the compiler does insert robust implementations to the price of some more instructions. Since the LAPACK tests use more than a "limited range" of precision for testing it might be considered a useful recommendation for LAPACK users to not use -complex-limited-range with LAPACK (/tests) ?
Naturally, it would be very hard (/impossible within reasonable effort) to define an exact range for floating point values / precision and to guarantee that all functions give reasonable results within that range. What we are dealing with here, is trying to push the allowed range to the maximum possible, right? Still: without even knowing any exact number.
What the tests do is to mark a lower limit of precision / value range within which the LAPACK functions do what we expect - given that the underlying technologies (compiler, processor(s)) don't deviate too much from the 'common setup'. It all remains pretty fuzzy. But my impression from what I have learned over the past few months is that this inaccuracy is inevitable for the time being. Mostly because of the lack of some specification, like IEEE754, including complex numbers.
Hi.
I think the flag -complex-limited-range is necessary to get the code compiled.
Just to clarify here: I meant that I could compile LAPACK with flag -mno-x87
, but, for that, I had to add -complex-limited-range
.
Since the LAPACK tests use more than a "limited range" of precision for testing it might be considered a useful recommendation for LAPACK users to not use -complex-limited-range with LAPACK (/tests) ?
I agree.
Naturally, it would be very hard (/impossible within reasonable effort) to define an exact range for floating point values / precision and to guarantee that all functions give reasonable results within that range. What we are dealing with here, is trying to push the allowed range to the maximum possible, right? Still: without even knowing any exact number.
What the tests do is to mark a lower limit of precision / value range within which the LAPACK functions do what we expect - given that the underlying technologies (compiler, processor(s)) don't deviate too much from the 'common setup'. It all remains pretty fuzzy. But my impression from what I have learned over the past few months is that this inaccuracy is inevitable for the time being. Mostly because of the lack of some specification, like IEEE754, including complex numbers.
I think we also want to test the precision and robustness of each routine in separate. For example, test if a given routine do not overflow/underflow when the final output is in the floating-point representable range. We recently included safe-scaling, e.g., #527, #514 and #594, that improves precision and enlarge the range of allowed values for a correct output.
Since the issue we are facing is more related to the processor than to LAPACK itself, I prepared 2 programs to test the intrinsic Fortran complex division and ABS:
https://gist.github.com/weslleyspereira/ac5babcc0bc370cb238012e3048440e3
https://gist.github.com/weslleyspereira/8144808440c2523b871a51f15a7932f9
Maybe they are useful for you too. Can you test it on Windows, possibly adding some flags like -complex-limited-range
?
Thanks.
Great! A very helpful test, @weslleyspereira !
I just ran the zdiv and zabs tests on my Windows machine, using ifort 2021, Release mode, x64, with varying flags: /Od vers. /O3 and w/o '/Qcomplex_limited_range':
ZABS
==================================================================================================
/Od (no optimizations):
DONE.
/O3 (maximize speed):
** ABS( 1.113-308+1.113-308*I ) = +1.573-308 differs from +1.573-308 #relative diff = +1.570E-16
** ABS( 1.000E+00+0.000E+00*I ) = +1.000E+00 differs from NaN
** ABS( 0.000E+00+1.000E+00*I ) = +1.000E+00 differs from NaN
** ABS( 1.000E+00+1.000E+00*I ) = +1.414E+00 differs from NaN
DONE.
/Qcomplex_limited_range /Od:
** ABS( 2.225-308+0.000E+00*I ) = +0.000E+00 differs from +2.225-308 #relative diff = +1.000E+00
** ABS( 4.941-324+0.000E+00*I ) = +0.000E+00 differs from +4.941-324 #relative diff = +1.000E+00
** ABS( 1.798+308+0.000E+00*I ) = +Infinity differs from +1.798+308 #relative diff = +Infinity
** ABS( 0.000E+00+2.225-308*I ) = +0.000E+00 differs from +2.225-308 #relative diff = +1.000E+00
** ABS( 0.000E+00+4.941-324*I ) = +0.000E+00 differs from +4.941-324 #relative diff = +1.000E+00
** ABS( 0.000E+00+1.798+308*I ) = +Infinity differs from +1.798+308 #relative diff = +Infinity
** ABS( 1.669-308+2.225-308*I ) = +0.000E+00 differs from +2.781-308 #relative diff = +1.000E+00
** ABS( 1.113-308+1.113-308*I ) = +0.000E+00 differs from +1.573-308 #relative diff = +1.000E+00
** ABS( 8.988+307+8.988+307*I ) = +Infinity differs from +1.271+308 #relative diff = +Infinity
DONE.
/Qcomplex_limited_range /O3:
** ABS( 1.119-154+1.492-154*I ) = +1.492-154 differs from +1.865-154 #relative diff = +2.000E-01
** ABS( 0.000E+00+2.225-308*I ) = +0.000E+00 differs from +2.781-308 #relative diff = +1.000E+00
** ABS( 7.458-155+7.458-155*I ) = +0.000E+00 differs from +1.055-154 #relative diff = +1.000E+00
** ABS( 8.988+307+8.988+307*I ) = +Infinity differs from +1.271+308 #relative diff = +Infinity
DONE.
ZDIV
==================================================================================================
/Od (no optimizations):
DONE.
/O3 (maximize speed):
** ( 4.941-324+0.000E+00*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+4.941-324*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+4.941-324*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 4.941-324+0.000E+00*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
DONE.
/Qcomplex_limited_range /Od:
** ( 2.225-308+0.000E+00*I ) / ( +2.225-308+0.000E+00*I ) = NaN NaN*I differs from +1.000E+00
** ( 4.941-324+0.000E+00*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +1.000E+00
** ( 1.798+308+0.000E+00*I ) / ( +1.798+308+0.000E+00*I ) = NaN+0.000E+00*I differs from +1.000E+00
** ( 0.000E+00+2.225-308*I ) / ( +0.000E+00+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+4.941-324*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+1.798+308*I ) / ( +0.000E+00+1.798+308*I ) = NaN+0.000E+00*I differs from +1.000E+00
** ( 2.225-308+2.225-308*I ) / ( +2.225-308+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
** ( 4.941-324+4.941-324*I ) / ( +4.941-324+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
** ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+2.225-308*I ) / ( +2.225-308+0.000E+00*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 0.000E+00+4.941-324*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 0.000E+00+1.798+308*I ) / ( +1.798+308+0.000E+00*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
** ( 2.225-308+0.000E+00*I ) / ( +0.000E+00+2.225-308*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
** ( 4.941-324+0.000E+00*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
** ( 1.798+308+0.000E+00*I ) / ( +0.000E+00+1.798+308*I ) = +0.000E+00 NaN*I differs from +0.000E+00-1.000E+00*I
** ( 2.225-308+2.225-308*I ) / ( +2.225-308-2.225-308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 4.941-324+4.941-324*I ) / ( +4.941-324-4.941-324*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
DONE.
/Qcomplex_limited_range /O3:
** ( 4.941-324+0.000E+00*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+4.941-324*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
** ( 2.225-308+2.225-308*I ) / ( +2.225-308+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
** ( 4.941-324+4.941-324*I ) / ( +4.941-324+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
** ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN NaN*I differs from +1.000E+00
** ( 0.000E+00+4.941-324*I ) / ( +4.941-324+0.000E+00*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 4.941-324+0.000E+00*I ) / ( +0.000E+00+4.941-324*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
** ( 2.225-308+2.225-308*I ) / ( +2.225-308-2.225-308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 4.941-324+4.941-324*I ) / ( +4.941-324-4.941-324*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
** ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
DONE.
Without optimizations either the default implementations are robust enough or x87 registers come to help again (likely). Both, /O3 and /Qcomplex_limited_range cause robustness issues with your test data. Interestingly enough, some issues vanish again when both optimizations are applied. (But this is of course no general rule.)
This test nicely demonstrates the trade off between precision and performance. It may also help locating the border line where floating point precision starts to cause trouble for common setups.
I improved the tests a bit. They now cover ranges of values instead of only the extremes.
You can try ifort with -fp-model precise
too, the default is -fp-model fast
.
Some tests on my Ubuntu 18.04.5 LTS
Test programs: https://gist.github.com/weslleyspereira/ac5babcc0bc370cb238012e3048440e3 https://gist.github.com/weslleyspereira/8144808440c2523b871a51f15a7932f9
Compilers:
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Here I just changed the optimization flags -O0 and -O3.
$ gfortran test_zcomplexabs.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
$ gfortran -O0 test_zcomplexabs.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
$ gfortran -O3 test_zcomplexabs.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
$ gfortran test_zcomplexdiv.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
[ c3] ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ c4] ( 8.988+307+8.988+307*I ) / ( +8.988+307+8.988+307*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ f3] ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
[ f4] ( 8.988+307+8.988+307*I ) / ( +8.988+307-8.988+307*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
$ gfortran -O0 test_zcomplexdiv.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
[ c3] ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ c4] ( 8.988+307+8.988+307*I ) / ( +8.988+307+8.988+307*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ f3] ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
[ f4] ( 8.988+307+8.988+307*I ) / ( +8.988+307-8.988+307*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
$ gfortran -O3 test_zcomplexdiv.f && ./a.out
# X := 4.9406564584124654E-324 2.2250738585072014E-308 1.7976931348623157E+308 8.9884656743115795E+307
# Blue min constant := 1.4916681462400413E-154
# Blue max constant := 1.9979190722022350E+146
[ c3] ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ c4] ( 8.988+307+8.988+307*I ) / ( +8.988+307+8.988+307*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ f3] ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
[ f4] ( 8.988+307+8.988+307*I ) / ( +8.988+307-8.988+307*I ) = +0.000E+00 NaN*I differs from +0.000E+00+1.000E+00*I
I played with the options: a) -fp-model precise (default is fast) b) -complex-limited-range -mno-x87 (default is -no-complex-limited-range -mx87)
$ ifort test_zcomplexabs.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
# [a] Subnormal numbers may be treated as 0
# [b] Subnormal numbers may be treated as 0
# [c] Subnormal numbers may be treated as 0
[c2] ABS( 0.000E+00+2.225-308*I ) = +2.225-308 differs from +2.781-308 #relative diff = +0.000E+00
# [d] Subnormal numbers may be treated as 0
# [d] Subnormal numbers may be treated as 0
$ ifort -fp-model precise test_zcomplexabs.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
$ ifort -complex-limited-range -mno-x87 test_zcomplexabs.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
# [a] Subnormal numbers may be treated as 0
# [b] Subnormal numbers may be treated as 0
[b2] ABS( 0.000E+00+2.225-308*I ) = +0.000E+00 differs from +2.225-308 #relative diff = +1.000E+00
(...)
[b2] ABS( 0.000E+00+7.458-155*I ) = +0.000E+00 differs from +7.458-155 #relative diff = +1.000E+00
[b3] ABS( 0.000E+00+1.798+308*I ) = +Infinity differs from +1.798+308 #relative diff = +Infinity
[b4] ABS( 0.000E+00+8.988+307*I ) = +Infinity differs from +8.988+307 #relative diff = +Infinity
(...)
[b4] ABS( 0.000E+00+1.341+154*I ) = +Infinity differs from +1.341+154 #relative diff = +Infinity
# [c] Subnormal numbers may be treated as 0
[c2] ABS( 0.000E+00+2.225-308*I ) = +0.000E+00 differs from +2.781-308 #relative diff = +0.000E+00
(...)
[c2] ABS( 1.119-154+1.492-154*I ) = +1.492-154 differs from +1.865-154 #relative diff = +0.000E+00
[c4] ABS( 6.741+307+8.988+307*I ) = +Infinity differs from +1.124+308 #relative diff = +0.000E+00
(...)
[c4] ABS( 1.006+154+1.341+154*I ) = +Infinity differs from +1.676+154 #relative diff = +0.000E+00
# [d] Subnormal numbers may be treated as 0
# [d] Subnormal numbers may be treated as 0
[d2] ABS( 2.225-308+2.225-308*I ) = +0.000E+00 differs from +3.147-308 #relative diff = +0.000E+00
(...)
[d2] ABS( 7.458-155+7.458-155*I ) = +0.000E+00 differs from +1.055-154 #relative diff = +0.000E+00
[d3] ABS( 8.988+307+8.988+307*I ) = +Infinity differs from +1.271+308 #relative diff = +0.000E+00
[d4] ABS( 4.494+307+4.494+307*I ) = +Infinity differs from +6.356+307 #relative diff = +0.000E+00
(...)
[d4] ABS( 1.341+154+1.341+154*I ) = +Infinity differs from +1.896+154 #relative diff = +0.000E+00
$ ifort -fp-model precise -complex-limited-range -mno-x87 test_zcomplexabs.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
[c1] ABS( 1.482-323+1.976-323*I ) = +0.000E+00 differs from +2.470-323 #relative diff = +0.000E+00
(...)
[c1] ABS( 8.344-309+1.113-308*I ) = +0.000E+00 differs from +1.391-308 #relative diff = +0.000E+00
[c2] ABS( 1.669-308+2.225-308*I ) = +0.000E+00 differs from +2.781-308 #relative diff = +0.000E+00
(...)
[c2] ABS( 3.334-162+4.446-162*I ) = +5.445-162 differs from +5.557-162 #relative diff = +0.000E+00
[c4] ABS( 6.741+307+8.988+307*I ) = +Infinity differs from +1.124+308 #relative diff = +0.000E+00
(...)
[c4] ABS( 1.006+154+1.341+154*I ) = +Infinity differs from +1.676+154 #relative diff = +0.000E+00
[d1] ABS( 4.941-324+4.941-324*I ) = +0.000E+00 differs from +4.941-324 #relative diff = +0.000E+00
(...)
[d1] ABS( 5.563-309+5.563-309*I ) = +0.000E+00 differs from +7.867-309 #relative diff = +0.000E+00
[d2] ABS( 1.113-308+1.113-308*I ) = +0.000E+00 differs from +1.573-308 #relative diff = +0.000E+00
(...)
[d2] ABS( 1.111-162+1.111-162*I ) = +0.000E+00 differs from +1.572-162 #relative diff = +0.000E+00
[d3] ABS( 8.988+307+8.988+307*I ) = +Infinity differs from +1.271+308 #relative diff = +0.000E+00
[d4] ABS( 4.494+307+4.494+307*I ) = +Infinity differs from +6.356+307 #relative diff = +0.000E+00
(...)
[d4] ABS( 1.341+154+1.341+154*I ) = +Infinity differs from +1.896+154 #relative diff = +0.000E+00
$ ifort test_zcomplexdiv.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
# [a] Subnormal numbers may be treated as 0
# [b] Subnormal numbers may be treated as 0
# [c] Subnormal numbers may be treated as 0
# [d] Subnormal numbers may be treated as 0
# [e] Subnormal numbers may be treated as 0
# [f] Subnormal numbers may be treated as 0
[na1] ( 0.000E+00+0.000E+00*I ) / ( NaN+0.000E+00*I ) = +0.000E+00+0.000E+00*I differs from NaN
$ ifort -fp-model precise test_zcomplexdiv.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
$ ifort -complex-limited-range -mno-x87 test_zcomplexdiv.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
# [a] Subnormal numbers may be treated as 0
# [b] Subnormal numbers may be treated as 0
[ b2] ( 0.000E+00+2.225-308*I ) / ( +0.000E+00+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ b2] ( 0.000E+00+7.458-155*I ) / ( +0.000E+00+7.458-155*I ) = NaN NaN*I differs from +1.000E+00
[ b3] ( 0.000E+00+1.798+308*I ) / ( +0.000E+00+1.798+308*I ) = NaN+0.000E+00*I differs from +1.000E+00
[ b4] ( 0.000E+00+8.988+307*I ) / ( +0.000E+00+8.988+307*I ) = NaN+0.000E+00*I differs from +1.000E+00
(...)
[ b4] ( 0.000E+00+1.341+154*I ) / ( +0.000E+00+1.341+154*I ) = NaN+0.000E+00*I differs from +1.000E+00
# [c] Subnormal numbers may be treated as 0
[ c2] ( 2.225-308+2.225-308*I ) / ( +2.225-308+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ c2] ( 7.458-155+7.458-155*I ) / ( +7.458-155+7.458-155*I ) = NaN NaN*I differs from +1.000E+00
[ c3] ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN NaN*I differs from +1.000E+00
[ c4] ( 8.988+307+8.988+307*I ) / ( +8.988+307+8.988+307*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ c4] ( 1.341+154+1.341+154*I ) / ( +1.341+154+1.341+154*I ) = NaN NaN*I differs from +1.000E+00
# [d] Subnormal numbers may be treated as 0
# [e] Subnormal numbers may be treated as 0
[ e2] ( 2.225-308+0.000E+00*I ) / ( +0.000E+00+2.225-308*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
(...)
[ e2] ( 7.458-155+0.000E+00*I ) / ( +0.000E+00+7.458-155*I ) = NaN NaN*I differs from +0.000E+00-1.000E+00*I
[ e3] ( 1.798+308+0.000E+00*I ) / ( +0.000E+00+1.798+308*I ) = +0.000E+00 NaN*I differs from +0.000E+00-1.000E+00*I
[ e4] ( 8.988+307+0.000E+00*I ) / ( +0.000E+00+8.988+307*I ) = +0.000E+00 NaN*I differs from +0.000E+00-1.000E+00*I
(...)
[ e4] ( 1.341+154+0.000E+00*I ) / ( +0.000E+00+1.341+154*I ) = +0.000E+00 NaN*I differs from +0.000E+00-1.000E+00*I
# [f] Subnormal numbers may be treated as 0
[ f2] ( 2.225-308+2.225-308*I ) / ( +2.225-308-2.225-308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
(...)
[ f2] ( 7.458-155+7.458-155*I ) / ( +7.458-155-7.458-155*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[ f3] ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[ f4] ( 8.988+307+8.988+307*I ) / ( +8.988+307-8.988+307*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
(...)
[ f4] ( 1.341+154+1.341+154*I ) / ( +1.341+154-1.341+154*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[na1] ( 0.000E+00+0.000E+00*I ) / ( NaN+0.000E+00*I ) = +0.000E+00+0.000E+00*I differs from NaN
$ ifort -fp-model precise -complex-limited-range -mno-x87 test_zcomplexdiv.f && ./a.out
# X := 4.940656458412465E-324 2.225073858507201E-308 1.797693134862316E+308
8.988465674311580E+307
# Blue min constant := 1.491668146240041E-154
# Blue max constant := 1.997919072202235E+146
[ c1] ( 4.941-324+4.941-324*I ) / ( +4.941-324+4.941-324*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ c1] ( 1.113-308+1.113-308*I ) / ( +1.113-308+1.113-308*I ) = NaN NaN*I differs from +1.000E+00
[ c2] ( 2.225-308+2.225-308*I ) / ( +2.225-308+2.225-308*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ c2] ( 1.111-162+1.111-162*I ) / ( +1.111-162+1.111-162*I ) = NaN NaN*I differs from +1.000E+00
[ c3] ( 1.798+308+1.798+308*I ) / ( +1.798+308+1.798+308*I ) = NaN NaN*I differs from +1.000E+00
[ c4] ( 8.988+307+8.988+307*I ) / ( +8.988+307+8.988+307*I ) = NaN NaN*I differs from +1.000E+00
(...)
[ c4] ( 1.341+154+1.341+154*I ) / ( +1.341+154+1.341+154*I ) = NaN NaN*I differs from +1.000E+00
[ f1] ( 4.941-324+4.941-324*I ) / ( +4.941-324-4.941-324*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
(...)
[ f1] ( 1.113-308+1.113-308*I ) / ( +1.113-308-1.113-308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[ f2] ( 2.225-308+2.225-308*I ) / ( +2.225-308-2.225-308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
(...)
[ f2] ( 1.111-162+1.111-162*I ) / ( +1.111-162-1.111-162*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[ f3] ( 1.798+308+1.798+308*I ) / ( +1.798+308-1.798+308*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
[ f4] ( 8.988+307+8.988+307*I ) / ( +8.988+307-8.988+307*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
(...)
[ f4] ( 1.341+154+1.341+154*I ) / ( +1.341+154-1.341+154*I ) = NaN NaN*I differs from +0.000E+00+1.000E+00*I
gfortran: Although the complex ABS is correctly computed, the division of large numbers is not. The test results are the same when I compile with -O0
or -O3
I don't know much of the flags on gfortran, so I will stop here.
ifort -fp-model precise
was the preciser option among the ones I tested. It uses x87 instructions.
When using no x87 instructions (I think this means SSE registers only), the ifort compiler fails on some operations with inputs inside the range (b, B), where b is the Blue's min and B is the Blue's max constants. Numbers in the range (b, B) can be safely squared.
The option -fp-model precise
improves some algorithms. E.g., when the real part of x is zero, ABS(x) computes the correct value no matter if the imaginary part of x is a subnormal or a huge number.
When using the -fp-model fast
, ifort may not propagate NaNs, e.g.,
[na1] ( 0.000E+00+0.000E+00*I ) / ( NaN+0.000E+00*I ) = +0.000E+00+0.000E+00*I differs from NaN
-fp-model fast
, I noticed that some subnormal numbers are not flushed to zero. For example,
if( Xj .eq. 0.0d0 ) on lines 80 or 85, the expression is evaluated as FALSE.
if( Xj .eq. 0.0d0 ) on line 105, the expression is evaluated as TRUE.
I think this is related to some optimization by the compiler.Hi @hokb. I am working on #623, and maybe you would be interested in reviewing that.
I think we may close this issue. We do require that some compiler intrinsic operations are robust enough so we can build some algorithms. #623 introduced some tests that evidence what we expect from the compilers. We may create other tests like those in future.
I will close the issue. Let me know if there is still something missing from this issue. Thanks!
We are working on a translation of LAPACK to .NET. We wrote a FORTRAN compiler which successfully translates all of LAPACK, including all tests. On real data types (almost) all tests pass. On complex data we are seeing a few precision issues, still.
Example: XEIGTSTZ < zec.in - fails due to underflow in ZLAHQR.
Steps to reproduce: ZGET37 -> knt == 31, ZHSEQR -> ZLAHQR -> at the end of the second QR step (ITS == 2) the following code causes underflow (on certain registers, see below)
Our compiler targets the .NET CLR. Its JIT decides to use SSE registers for ABS(TEMP), which leads to the underflow in the intermediate calculation of the magnitude. Ifort (as another example) uses floating point registers in this situation, hence does not underflow (because of its larger length: 80 bits). I am trying to get a clear(er) picture of what to expect from LAPACK regarding which precision / number range it requires from the compiler / processor at runtime.
Are all tests for double precision designed to require 64 bit registers at least ? Or are they designed in a way to succeed for the set of popular FORTRAN compilers available today? (In the first case above issue (and similar others) may require attention. Should I file an issue for them?)
I looked for some specification but couldn't find it yet. Any link would also be appreciated. Thanks in advance!