OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.32k stars 1.49k forks source link

AVX2: dgemm on HASWELL leads to invalid results (low accuracy) #4747

Open davidcl opened 3 months ago

davidcl commented 3 months ago

After upgrading Scilab OpenBLAS build, we detected a poor precision result impacting dgemm when using AVX2 kernels.

With OpenBLAS 0.3.27:

$ OPENBLAS_CORETYPE=NEHALEM scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        0       0

$ OPENBLAS_CORETYPE=HASWELL scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        -3.33067E-16    0

With refBLAS 3.12.0 :

$ scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        0       0
test code ```c #include #include extern void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, double *a, int *lda, double *b, int *ldb, double *beta, double *c, int *ldc); int main() { int ONE = 1; int TWO = 2; double dZERO = 0; double dONE = 1; double A_tab[] = {1., 2. / 3.}; double* A = A_tab; double B_tab[] = {-6., 9., 0., 0.}; double* B = B_tab; double C_tab[] = {NAN, NAN}; double* C = C_tab; dgemm_("n", "n", &ONE, &TWO, &TWO, &dONE, A, &ONE, B, &TWO, &dZERO, C, &ONE); printf("[%lG, %lG] * [%lG %lG ; %lG %lG]\n", A[0], A[1], B[0], B[2], B[1], B[3]); printf("\t%lG\t%lG\n", C[0], C[1]); return 0; } ```
martin-frbg commented 3 months ago

Curious, the current DGEMM kernel for Haswell has been in use for years. Have you only started testing/using OpenBLAS recently, or did something else (like the default compiler version or options) change in your setup lately ?

davidcl commented 3 months ago

We were compiling without DYNAMIC_TARGET thus selecting NEHALEM arch as the minimal supported target. All the opened bugs are due to a more active cross-checking of Windows vs Linux results on our test base after the GCC 8 to GCC 11 Linux migration.

Note: all these corner-case bugs might or might not be valid at some points; eg. in that case, EPSILON might be acceptable in most codebases.

brada4 commented 3 months ago

Symbolic result is e-7 off, so both code paths do very well at rounding artifacts