Reference-LAPACK / lapack

LAPACK development repository
Other
1.51k stars 441 forks source link

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732

Open bartoldeman opened 2 years ago

bartoldeman commented 2 years ago

Description

Tested on master, compiling LAPACK with

cp make.inc.example make.inc
make FC="gfortran-11.3.0" make FFLAGS="-O2 -frecursive -march=haswell -ftree-vectorize" -j 32

brings about many test failures:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1296387         1301    (0.100%)        0       (0.000%)
DOUBLE PRECISION        1304985         1274    (0.098%)        0       (0.000%)
COMPLEX                 748111          1240    (0.166%)        0       (0.000%)
COMPLEX16               761754          561     (0.074%)        0       (0.000%)

--> ALL PRECISIONS      4111237         4376    (0.106%)        0       (0.000%)

There are three categories of failures here: 1) A bug in GCC, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254. Fortunately a fix is provided there and it can be trivially backported. 2) Two minor test failures in REAL and COMPLEX, see https://github.com/Reference-LAPACK/lapack/issues/679 3) The tests insisting that when e.g. CGEEV is called to provide eigenvectors, it will give the exact same eigenvalues (no numerical tolerance) as when it does not provide eigenvectors.

Applying the patch to GCC, tests go down to:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1310427         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 762151          254     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4155541         749     (0.018%)        0       (0.000%)

Increasing the tolerance for 2) using

diff --git a/TESTING/ctest_rfp.in b/TESTING/ctest_rfp.in
index d6988f2a7..82e147535 100644
--- a/TESTING/ctest_rfp.in
+++ b/TESTING/ctest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing COMPLEX LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits
diff --git a/TESTING/stest_rfp.in b/TESTING/stest_rfp.in
index 08dbf83e6..9b082b7df 100644
--- a/TESTING/stest_rfp.in
+++ b/TESTING/stest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing REAL LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits

we get

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1318203         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 769927          253     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4171093         747     (0.018%)        0       (0.000%)

The third one is more complex to analyze. The way I understand it is that gfortran has some freedom in how to apply FMA's to expressions such as a*b+c*d (ie. fma(a,b,c*d) or fma(c,d,a*b)) and these give slightly different results. The loop bounds for computing and not computing eigenvectors for various loops are different, hence it is often the case that a loop with eigenvectors is vectorized, and without isn't, or for different loop iterations, and the vectorized use of FMA isn't identical to the unvectorized use of FMA. And with complex numbers of course there are even more permutations possible. Adding parentheses forces the compiler to use one variety.

Applying this patch:

diff --git a/SRC/clahqr.f b/SRC/clahqr.f
index dbd848e2f..260764ef9 100644
--- a/SRC/clahqr.f
+++ b/SRC/clahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = CONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = CONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*CONJG( V2 )
    90       CONTINUE
diff --git a/SRC/claqr5.f b/SRC/claqr5.f
index 0a01cc226..a19800b1e 100644
--- a/SRC/claqr5.f
+++ b/SRC/claqr5.f
@@ -616,7 +616,7 @@
                T3 = T1*CONJG( V( 3, M ) )
                DO 70 J = JTOP, MIN( KBOT, K+3 )
                   REFSUM = H( J, K+1 ) + V( 2, M )*H( J, K+2 )
-     $                     + V( 3, M )*H( J, K+3 )
+     $                     + ( V( 3, M )*H( J, K+3 ) )
                   H( J, K+1 ) = H( J, K+1 ) - REFSUM*T1
                   H( J, K+2 ) = H( J, K+2 ) - REFSUM*T2
                   H( J, K+3 ) = H( J, K+3 ) - REFSUM*T3
diff --git a/SRC/zlahqr.f b/SRC/zlahqr.f
index 9413f20cc..c632f578a 100644
--- a/SRC/zlahqr.f
+++ b/SRC/zlahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = DCONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = DCONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*DCONJG( V2 )
    90       CONTINUE
diff --git a/SRC/zlaqr5.f b/SRC/zlaqr5.f
index 4fa5ee5b0..656a7ab37 100644
--- a/SRC/zlaqr5.f
+++ b/SRC/zlaqr5.f
@@ -446,7 +446,7 @@
                T2 = T1*V( 2, M22 )
                DO 40 J = K+1, JBOT
                   REFSUM = H( K+1, J ) +
-     $                     DCONJG( V( 2, M22 ) )*H( K+2, J )
+     $                     ( DCONJG( V( 2, M22 ) )*H( K+2, J ) )
                   H( K+1, J ) = H( K+1, J ) - REFSUM*T1
                   H( K+2, J ) = H( K+2, J ) - REFSUM*T2
    40          CONTINUE

makes the test failures go to 0.

But it looks quite fragile, in that it takes trial and error to see which expressions need braces.

Would it be ok instead to replace test 5, e.g. https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cdrvev.f#L799-L805 by a test that uses tolerances, like the one used below, or would that go against some specifications?

https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cchkhs.f#L87-L87 https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cchkhs.f#L825-L835

Checklist

angsch commented 2 years ago

Aggressive compiler optimization and/or the usage of a highly optimized BLAS is known sources for failures in the stringent LAPACK tests. See for example the LAPACK test summary shown in https://github.com/xianyi/OpenBLAS/pull/3609. The failures do not necessarily mean that something is fundamentally wrong.

How do I interpret LAPACK testing failures? [...] Testing failures can be divided into two categories. Minor testing failures, and major testing failures. A minor testing failure is one in which the test ratio reported in the LAPACK/TESTING/.out file slightly exceeds the threshold (specified in the associated LAPACK/TESTING/.in file). The cause of such failures can mainly be attributed to differences in the implementation of math libraries (square root, absolute value, complex division, complex absolute value, etc). These failures are negligible, and do not affect the proper functioning of the library.

The failures related to laqr fall into the category of the minor failures. The flag march=haswell activates vectorization and heavy FMA usage. With aggressive compiler optimization, many tests are slightly above the threshold. When you apply the laqr patch, one instruction is saved per update. That means one instruction less to accumulate errors - just enough to fall below the tight error threshold. This was actually one of the intentions behind this patch.