Closed vtjnash closed 11 years ago
In ./kernel/x86_64/zgemv_n.S, I noticed that ALPHA_I
and MMM
are assigned the same stack location. Correcting this fixes the lapack tests.
--- ../xianyi-OpenBLAS-9c51cdf/kernel/x86_64/zgemv_n.S 2013-08-01 11:53:12.000000000 -0400
+++ kernel/x86_64/zgemv_n.S 2013-08-23 02:40:54.750720605 -0400
@@ -70,7 +70,7 @@
#else
-#define STACKSIZE 288
+#define STACKSIZE 296
#define OLD_ALPHA_I 40 + STACKSIZE(%rsp)
#define OLD_A 48 + STACKSIZE(%rsp)
@@ -83,13 +83,13 @@
#define ALPHA_R 224 (%rsp)
#define ALPHA_I 232 (%rsp)
-#define MMM 232(%rsp)
-#define NN 240(%rsp)
-#define AA 248(%rsp)
-#define XX 256(%rsp)
-#define LDAX 264(%rsp)
-#define ALPHAR 272(%rsp)
-#define ALPHAI 280(%rsp)
+#define MMM 240(%rsp)
+#define NN 248(%rsp)
+#define AA 256(%rsp)
+#define XX 264(%rsp)
+#define LDAX 272(%rsp)
+#define ALPHAR 280(%rsp)
+#define ALPHAI 288(%rsp)
#define M %rcx
#define N %rdx
Hi @vtjnash ,
Thank you for your patch.
Is this zgemv_n patch also fixed NUM_THREADS > 1 bug?
Thank you
Xianyi
no, i think that is a separate issue. i haven't had time to look into that yet, however.
With further testing, I discovered that the test doesn't freeze completely with NUM_THREADS > 1, it just runs very, very, very slowly. On my 8-core machine, it uses about 80% of one core while running the tests (I think most of it is wasted in an Microsoft Unlock intrinsic). However, I'll open a new issues for that.
make lapack_testing
freezes on windows (32 and 64) forNUM_THREADS > 1
on win32, without threads, the lapack tests are fine
on win64, without threads, the real, double, and complex lapack tests pass, but the following test freezes with 100% cpu usage
for this, make was called with the following parameters:
make CC="x86_64-w64-mingw32-gcc" FC="x86_64-w64-mingw32-gfortran" RANLIB="x86_64-w64-mingw32-ranlib" FFLAGS=" -O2 " USE_THREAD=0 NO_AFFINITY=1 INTERFACE64=1 OSNAME=WINNT CROSS=1 HOSTCC=gcc BINARY=64 -j100
Here's the test summary for win64, without threading (tests without output were terminated after
Note, there are several places in
lapack/
anddriver/
where thelong
type is used instead ofsize_t
(orBLASULONG
, in a few places).Finally, I think the INTERFACE64 is not being respected and is defaulting to 1.