OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.29k stars 1.49k forks source link

zgemv_n.S typo affecting win64, also win threading not working? #282

Closed vtjnash closed 11 years ago

vtjnash commented 11 years ago

make lapack_testing freezes on windows (32 and 64) for NUM_THREADS > 1

on win32, without threads, the lapack tests are fine

on win64, without threads, the real, double, and complex lapack tests pass, but the following test freezes with 100% cpu usage

Testing COMPLEX16 LAPACK linear equation routines
./xlintstz < ztest.in > ztest.out 2>&1

for this, make was called with the following parameters: make CC="x86_64-w64-mingw32-gcc" FC="x86_64-w64-mingw32-gfortran" RANLIB="x86_64-w64-mingw32-ranlib" FFLAGS=" -O2 " USE_THREAD=0 NO_AFFINITY=1 INTERFACE64=1 OSNAME=WINNT CROSS=1 HOSTCC=gcc BINARY=64 -j100

Here's the test summary for win64, without threading (tests without output were terminated after

$ ./lapack_testing.py 

---------------- Testing LAPACK Routines ----------------

-- Detailed results are stored in testing_results.txt

------------------------- REAL              ------------------------

-->  Testing REAL              Nonsymmetric Eigenvalue Problem [ snep.out ]
-->  Tests passed: 8820

-->  Testing REAL              Symmetric Eigenvalue Problem [ ssep.out ]
-->  Tests passed: 89520

-->  Testing REAL              Singular Value Decomposition [ ssvd.out ]
-->  Tests passed: 69350

-->  Testing REAL              Eigen Condition [ sec.out ]
-->  Tests passed: 501251

-->  Testing REAL              Nonsymmetric Eigenvalue [ sed.out ]
-->  Tests passed: 12982

-->  Testing REAL              Nonsymmetric Generalized Eigenvalue Problem [ sgg.out ]
-->  Tests passed: 13832

-->  Testing REAL              Nonsymmetric Generalized Eigenvalue Problem driver [ sgd.out ]
-->  Tests passed: 7830

-->  Testing REAL              Symmetric Eigenvalue Problem [ ssb.out ]
-->  Tests passed: 540

-->  Testing REAL              Symmetric Eigenvalue Generalized Problem [ ssg.out ]
-->  Tests passed: 30870

-->  Testing REAL              Banded Singular Value Decomposition routines [ sbb.out ]
-->  Tests passed: 6000

-->  Testing REAL              Generalized Linear Regression Model routines [ sglm.out ]
-->  Tests passed: 48

-->  Testing REAL              Generalized QR and RQ factorization routines [ sgqr.out ]
-->  Tests passed: 1728

-->  Testing REAL              Generalized Singular Value Decomposition routines [ sgsv.out ]
-->  Tests passed: 384

-->  Testing REAL              CS Decomposition routines [ scsd.out ]
-->  Tests passed: 270

-->  Testing REAL              Constrained Linear Least Squares routines [ slse.out ]
-->  Tests passed: 96

-->  Testing REAL              Linear Equation routines [ stest.out ]
-->  Tests passed: 320530

-->  Testing REAL              RFP linear equation routines [ stest_rfp.out ]
-->  Tests passed: 13176

------------------------- DOUBLE PRECISION ------------------------

-->  Testing DOUBLE PRECISION Nonsymmetric Eigenvalue Problem [ dnep.out ]
 DHS:    1 out of  1764 tests failed to pass the threshold
-->  Tests passed: 7056
-->  Tests failing to pass the threshold: 1

-->  Testing DOUBLE PRECISION Symmetric Eigenvalue Problem [ dsep.out ]
-->  Tests passed: 89520

-->  Testing DOUBLE PRECISION Singular Value Decomposition [ dsvd.out ]
-->  Tests passed: 69350

-->  Testing DOUBLE PRECISION Eigen Condition [ dec.out ]
-->  Tests passed: 501251

-->  Testing DOUBLE PRECISION Nonsymmetric Eigenvalue [ ded.out ]
-->  Tests passed: 12982

-->  Testing DOUBLE PRECISION Nonsymmetric Generalized Eigenvalue Problem [ dgg.out ]
-->  Tests passed: 13832

-->  Testing DOUBLE PRECISION Nonsymmetric Generalized Eigenvalue Problem driver [ dgd.out ]
-->  Tests passed: 7830

-->  Testing DOUBLE PRECISION Symmetric Eigenvalue Problem [ dsb.out ]
-->  Tests passed: 540

-->  Testing DOUBLE PRECISION Symmetric Eigenvalue Generalized Problem [ dsg.out ]
-->  Tests passed: 30870

-->  Testing DOUBLE PRECISION Banded Singular Value Decomposition routines [ dbb.out ]
-->  Tests passed: 6000

-->  Testing DOUBLE PRECISION Generalized Linear Regression Model routines [ dglm.out ]
-->  Tests passed: 48

-->  Testing DOUBLE PRECISION Generalized QR and RQ factorization routines [ dgqr.out ]
-->  Tests passed: 1728

-->  Testing DOUBLE PRECISION Generalized Singular Value Decomposition routines [ dgsv.out ]
-->  Tests passed: 384

-->  Testing DOUBLE PRECISION CS Decomposition routines [ dcsd.out ]
-->  Tests passed: 270

-->  Testing DOUBLE PRECISION Constrained Linear Least Squares routines [ dlse.out ]
-->  Tests passed: 96

-->  Testing DOUBLE PRECISION Linear Equation routines [ dtest.out ]
-->  Tests passed: 320530

-->  Testing DOUBLE PRECISION Mixed Precision linear equation routines [ dstest.out ]
-->  Tests passed: 812

-->  Testing DOUBLE PRECISION RFP linear equation routines [ dtest_rfp.out ]
-->  Tests passed: 13176

------------------------- COMPLEX           ------------------------

-->  Testing COMPLEX           Nonsymmetric Eigenvalue Problem [ cnep.out ]
-->  Tests passed: 8820

-->  Testing COMPLEX           Symmetric Eigenvalue Problem [ csep.out ]
-->  Tests passed: 77280

-->  Testing COMPLEX           Singular Value Decomposition [ csvd.out ]
-->  Tests passed: 44625

-->  Testing COMPLEX           Eigen Condition [ cec.out ]
-->  Tests passed: 5966

-->  Testing COMPLEX           Nonsymmetric Eigenvalue [ ced.out ]
-->  Tests passed: 12778

-->  Testing COMPLEX           Nonsymmetric Generalized Eigenvalue Problem [ cgg.out ]
-->  Tests passed: 13832

-->  Testing COMPLEX           Nonsymmetric Generalized Eigenvalue Problem driver [ cgd.out ]
-->  Tests passed: 7830

-->  Testing COMPLEX           Symmetric Eigenvalue Problem [ csb.out ]
-->  Tests passed: 540

-->  Testing COMPLEX           Symmetric Eigenvalue Generalized Problem [ csg.out ]
-->  Tests passed: 30870

-->  Testing COMPLEX           Banded Singular Value Decomposition routines [ cbb.out ]
-->  Tests passed: 6000

-->  Testing COMPLEX           Generalized Linear Regression Model routines [ cglm.out ]
-->  Tests passed: 48

-->  Testing COMPLEX           Generalized QR and RQ factorization routines [ cgqr.out ]
-->  Tests passed: 1728

-->  Testing COMPLEX           Generalized Singular Value Decomposition routines [ cgsv.out ]
-->  Tests passed: 384

-->  Testing COMPLEX           CS Decomposition routines [ ccsd.out ]
-->  Tests passed: 270

-->  Testing COMPLEX           Constrained Linear Least Squares routines [ clse.out ]
-->  Tests passed: 96

-->  Testing COMPLEX           Linear Equation routines [ ctest.out ]
-->  Tests passed: 327355

-->  Testing COMPLEX           RFP linear equation routines [ ctest_rfp.out ]
  CTFSM auxiliary routine:     1 out of  7776 tests failed to pass the threshold
-->  Tests passed: 5400
-->  Tests failing to pass the threshold: 1

------------------------- COMPLEX16          ------------------------

-->  Testing COMPLEX16          Nonsymmetric Eigenvalue Problem [ znep.out ]
---- WARNING: please check that you have the LAPACK output : znep.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Symmetric Eigenvalue Problem [ zsep.out ]
---- WARNING: please check that you have the LAPACK output : zsep.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Singular Value Decomposition [ zsvd.out ]
---- WARNING: please check that you have the LAPACK output : zsvd.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Eigen Condition [ zec.out ]
-->  Tests passed: 5966

-->  Testing COMPLEX16          Nonsymmetric Eigenvalue [ zed.out ]
---- WARNING: please check that you have the LAPACK output : zed.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Nonsymmetric Generalized Eigenvalue Problem [ zgg.out ]
---- WARNING: please check that you have the LAPACK output : zgg.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Nonsymmetric Generalized Eigenvalue Problem driver [ zgd.out ]
---- WARNING: please check that you have the LAPACK output : zgd.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Symmetric Eigenvalue Problem [ zsb.out ]
-->  Tests passed: 540

-->  Testing COMPLEX16          Symmetric Eigenvalue Generalized Problem [ zsg.out ]
---- WARNING: please check that you have the LAPACK output : zsg.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Banded Singular Value Decomposition routines [ zbb.out ]
---- WARNING: please check that you have the LAPACK output : zbb.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Generalized Linear Regression Model routines [ zglm.out ]
---- WARNING: please check that you have the LAPACK output : zglm.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Generalized QR and RQ factorization routines [ zgqr.out ]
-->  Tests passed: 1728

-->  Testing COMPLEX16          Generalized Singular Value Decomposition routines [ zgsv.out ]
-->  Tests passed: 384

-->  Testing COMPLEX16          CS Decomposition routines [ zcsd.out ]
-->  Tests passed: 270

-->  Testing COMPLEX16          Constrained Linear Least Squares routines [ zlse.out ]
---- WARNING: please check that you have the LAPACK output : zlse.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Linear Equation routines [ ztest.out ]
---- WARNING: please check that you have the LAPACK output : ztest.out!
---- WARNING: with the option -r, we can run the LAPACK testing for you

-->  Testing COMPLEX16          Mixed Precision linear equation routines [ zctest.out ]
-->  Tests passed: 812

-->  Testing COMPLEX16          RFP linear equation routines [ ztest_rfp.out ]
-->  Tests passed: 13176

            -->   LAPACK TESTING SUMMARY  <--
        Processing LAPACK Testing output found in the TESTING direcory
SUMMARY                 nb test run     numerical error     other error  
================    =========== =================   ================  
REAL                1077227     0   (0.000%)    0   (0.000%)    
DOUBLE PRECISION    1076275     1   (0.000%)    0   (0.000%)    
COMPLEX             543822      1   (0.000%)    0   (0.000%)    
COMPLEX16           22876       0   (0.000%)    0   (0.000%)    

--> ALL PRECISIONS  2720200     2   (0.000%)    0   (0.000%)

Note, there are several places in lapack/ and driver/ where the long type is used instead of size_t (or BLASULONG, in a few places).

Finally, I think the INTERFACE64 is not being respected and is defaulting to 1.

vtjnash commented 11 years ago

In ./kernel/x86_64/zgemv_n.S, I noticed that ALPHA_I and MMM are assigned the same stack location. Correcting this fixes the lapack tests.

--- ../xianyi-OpenBLAS-9c51cdf/kernel/x86_64/zgemv_n.S  2013-08-01 11:53:12.000000000 -0400
+++ kernel/x86_64/zgemv_n.S 2013-08-23 02:40:54.750720605 -0400
@@ -70,7 +70,7 @@

 #else

-#define STACKSIZE  288
+#define STACKSIZE  296

 #define OLD_ALPHA_I     40 + STACKSIZE(%rsp)
 #define OLD_A       48 + STACKSIZE(%rsp)
@@ -83,13 +83,13 @@
 #define ALPHA_R        224        (%rsp)
 #define ALPHA_I        232        (%rsp)

-#define MMM    232(%rsp)
-#define NN 240(%rsp)
-#define AA 248(%rsp)
-#define    XX  256(%rsp)
-#define LDAX   264(%rsp)
-#define ALPHAR 272(%rsp)
-#define ALPHAI 280(%rsp)
+#define MMM    240(%rsp)
+#define NN 248(%rsp)
+#define AA 256(%rsp)
+#define    XX  264(%rsp)
+#define LDAX   272(%rsp)
+#define ALPHAR 280(%rsp)
+#define ALPHAI 288(%rsp)

 #define M    %rcx
 #define N    %rdx
xianyi commented 11 years ago

Hi @vtjnash ,

Thank you for your patch.

Is this zgemv_n patch also fixed NUM_THREADS > 1 bug?

Thank you

Xianyi

vtjnash commented 11 years ago

no, i think that is a separate issue. i haven't had time to look into that yet, however.

vtjnash commented 11 years ago

With further testing, I discovered that the test doesn't freeze completely with NUM_THREADS > 1, it just runs very, very, very slowly. On my 8-core machine, it uses about 80% of one core while running the tests (I think most of it is wasted in an Microsoft Unlock intrinsic). However, I'll open a new issues for that.