Many test failures when compiled with `-march=haswell -ftree-vectorize`

Description

Tested on master, compiling LAPACK with

cp make.inc.example make.inc
make FC="gfortran-11.3.0" make FFLAGS="-O2 -frecursive -march=haswell -ftree-vectorize" -j 32

brings about many test failures:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1296387         1301    (0.100%)        0       (0.000%)
DOUBLE PRECISION        1304985         1274    (0.098%)        0       (0.000%)
COMPLEX                 748111          1240    (0.166%)        0       (0.000%)
COMPLEX16               761754          561     (0.074%)        0       (0.000%)

--> ALL PRECISIONS      4111237         4376    (0.106%)        0       (0.000%)

There are three categories of failures here: 1) A bug in GCC, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254. Fortunately a fix is provided there and it can be trivially backported. 2) Two minor test failures in REAL and COMPLEX, see https://github.com/Reference-LAPACK/lapack/issues/679 3) The tests insisting that when e.g. CGEEV is called to provide eigenvectors, it will give the exact same eigenvalues (no numerical tolerance) as when it does not provide eigenvectors.

Applying the patch to GCC, tests go down to:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1310427         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 762151          254     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4155541         749     (0.018%)        0       (0.000%)

Increasing the tolerance for 2) using

diff --git a/TESTING/ctest_rfp.in b/TESTING/ctest_rfp.in
index d6988f2a7..82e147535 100644
--- a/TESTING/ctest_rfp.in
+++ b/TESTING/ctest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing COMPLEX LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits
diff --git a/TESTING/stest_rfp.in b/TESTING/stest_rfp.in
index 08dbf83e6..9b082b7df 100644
--- a/TESTING/stest_rfp.in
+++ b/TESTING/stest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing REAL LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits

we get

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1318203         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 769927          253     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4171093         747     (0.018%)        0       (0.000%)

The third one is more complex to analyze. The way I understand it is that gfortran has some freedom in how to apply FMA's to expressions such as a*b+c*d (ie. fma(a,b,c*d) or fma(c,d,a*b)) and these give slightly different results. The loop bounds for computing and not computing eigenvectors for various loops are different, hence it is often the case that a loop with eigenvectors is vectorized, and without isn't, or for different loop iterations, and the vectorized use of FMA isn't identical to the unvectorized use of FMA. And with complex numbers of course there are even more permutations possible. Adding parentheses forces the compiler to use one variety.

Applying this patch:

diff --git a/SRC/clahqr.f b/SRC/clahqr.f
index dbd848e2f..260764ef9 100644
--- a/SRC/clahqr.f
+++ b/SRC/clahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = CONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = CONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*CONJG( V2 )
    90       CONTINUE
diff --git a/SRC/claqr5.f b/SRC/claqr5.f
index 0a01cc226..a19800b1e 100644
--- a/SRC/claqr5.f
+++ b/SRC/claqr5.f
@@ -616,7 +616,7 @@
                T3 = T1*CONJG( V( 3, M ) )
                DO 70 J = JTOP, MIN( KBOT, K+3 )
                   REFSUM = H( J, K+1 ) + V( 2, M )*H( J, K+2 )
-     $                     + V( 3, M )*H( J, K+3 )
+     $                     + ( V( 3, M )*H( J, K+3 ) )
                   H( J, K+1 ) = H( J, K+1 ) - REFSUM*T1
                   H( J, K+2 ) = H( J, K+2 ) - REFSUM*T2
                   H( J, K+3 ) = H( J, K+3 ) - REFSUM*T3
diff --git a/SRC/zlahqr.f b/SRC/zlahqr.f
index 9413f20cc..c632f578a 100644
--- a/SRC/zlahqr.f
+++ b/SRC/zlahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = DCONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = DCONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*DCONJG( V2 )
    90       CONTINUE
diff --git a/SRC/zlaqr5.f b/SRC/zlaqr5.f
index 4fa5ee5b0..656a7ab37 100644
--- a/SRC/zlaqr5.f
+++ b/SRC/zlaqr5.f
@@ -446,7 +446,7 @@
                T2 = T1*V( 2, M22 )
                DO 40 J = K+1, JBOT
                   REFSUM = H( K+1, J ) +
-     $                     DCONJG( V( 2, M22 ) )*H( K+2, J )
+     $                     ( DCONJG( V( 2, M22 ) )*H( K+2, J ) )
                   H( K+1, J ) = H( K+1, J ) - REFSUM*T1
                   H( K+2, J ) = H( K+2, J ) - REFSUM*T2
    40          CONTINUE

makes the test failures go to 0.

But it looks quite fragile, in that it takes trial and error to see which expressions need braces.

Would it be ok instead to replace test 5, e.g. https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cdrvev.f#L799-L805 by a test that uses tolerances, like the one used below, or would that go against some specifications?

https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cchkhs.f#L87-L87 https://github.com/Reference-LAPACK/lapack/blob/28f7e8309608b92aaec2e2556d4b25d758ccada9/TESTING/EIG/cchkhs.f#L825-L835

Checklist

[x] I've included a minimal example to reproduce the issue
[x] I'd be willing to make a PR to solve this issue

Reference-LAPACK / lapack

Many test failures when compiled with `-march=haswell -ftree-vectorize` #732