Closed xianyi closed 8 years ago
Penryn,Prescott - FAIL TOO Atom,Nehalem - GOOD Bulldozer, Operon_SSE3 - FAIL Piledriver - GOOD
@brada4 , Thank you for the feedback.
@matthew-brett said the bug is reproduced in batch mode. If run the single test, it is ok.
Must be cousin of #750 There are 29 unique contexts with 3800 calls total that read uninitialized allocated memory in Atom that are not in Core2 (courtesy valgrind)
@brada4 @matthew-brett What's the wrong or correct output?
I got the following output. Does it pass the test? Or not?
OPENBLAS_CORETYPE=Core2 python -c 'import scipy.linalg; scipy.linalg.test()'
7
Running unit tests for scipy.linalg
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in /usr/local/lib/python2.7/dist-packages/numpy
SciPy version 0.17.0
SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy
Python version 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
nose version 1.3.7
....................................................................................................0-th dimension must be fixed to 3 but got 15
..0-th dimension must be fixed to 3 but got 5
..............0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
................................K...........................................................................................................................................................................................................................................................................................................K...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S..................K.........................................K...................................................................................
----------------------------------------------------------------------
Ran 1507 tests in 12.020s
OK (KNOWNFAIL=4, SKIP=1)
I would say correct because I get:
OPENBLAS_CORETYPE=Core2 OPENBLAS_NUM_THREADS=12 python -c "import scipy.linalg; scipy.linalg.test()"
Running unit tests for scipy.linalg
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in /usr/lib/python2.7/dist-packages/numpy
SciPy version 0.17.0
SciPy is installed in /usr/lib/python2.7/dist-packages/scipy
Python version 2.7.11+ (default, Feb 22 2016, 16:38:42) [GCC 5.3.1 20160220]
nose version 1.3.7
....................................................................................................0-th dimension must be fixed to 3 but got 15
..0-th dimension must be fixed to 3 but got 5
..............0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
................................K.....................................................................................................................................................................................................................................................F.....................................................K...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S..................K.........................................K...................................................................................
======================================================================
FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", line 658, in eigenhproblem_general
assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 892, in assert_array_almost_equal
precision=decimal)
File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 713, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 4 decimals
(mismatch 100.0%)
x: array([ 0., 0., 0.], dtype=float32)
y: array([ 1., 1., 1.])
----------------------------------------------------------------------
Ran 1507 tests in 12.147s
FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
@jeromerobert - yes, you have the same output that I get, showing the failing test; @xianyi - you have the output when the test is passing. Are you running with:
docker run --rm -v $PWD:/io quay.io/pypa/manylinux1_x86_64 /io/replicate_core2_fail.sh
?
@jeromerobert , @matthew-brett , Thank you for the answer.
I didn't try docker yet. I just ran the test on my Sandy Bridge machine. I will try to install the docker and test it.
Here's a script (to run in docker) that replicates the test failure on latest Ubuntu, to remove the slightly weird initial build environment from the equation:
dgemv reads unitialized allocations a lot in failing sets of kernels (i am at 10th/29 contexts)
I can run docker and reproduce this error. Try to replace OpenBLAS some functions with reference implementation to isolate the issue.
I ran the test on an up to date Debian Stretch. The only difference between xianyi passing test is:
- Python version 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
+ Python version 2.7.11+ (default, Feb 22 2016, 16:38:42) [GCC 5.3.1 20160220]
The last ubuntu (Wily) have gcc 5.2.1 and python 2.7.9.
I ran on openSuSE Leap 42.1 with pip27 / python 2.7.9/ install --user ... and CC=clang 3.7 (i tried gcc, omp/pthread/up builds, no difference) Also daxpy reads fresh allocations. 17/29
There is an (unsolved) question on stackoverflow about this exact error - http://stackoverflow.com/questions/27104889 dates from november 2014 (python version 2.7.8, hardware not mentioned). Back then somebody suggested backwards- (and lapack-) incompatible changes to the normalization of eigenvectors in scipy 0.14 might be the culprit. At least it is safe to say that this is not a recent regression. If gemv is implicated in any way (is it, or was that just an unrelated observation ?) I wonder if it could be related to the weirdness in #695 (i.e. would resorting to the generic "C" implementation of the GEMV kernels solve this one too ?)
I just replaced level 2 and 1 assembly kernels by C kernels for Core2 target. It passed the test.
@martin-frbg i am chasing call chains that yield reading fresh allocations. Next step will be looking if that impacts the final math. @xianyi give me more hour. I will get the list of failing kernels. Maybe zeroing alloc can help compute-heaviest ones.
jeromerobert appears to have a tentative workaround for the gemv functions in #746 (using calloc on the workspace array)
dgemv_n.S:1942 is crossed 4 times during the test and introduces damage if the fresh allocation is pre-populated with non-zeroes. i.e it can fail in any of the cases. Brainier processors use brainier kernels that dont exhibit such weakness.
@brada4
(mismatch 100.0%)
x: array([ 0., 0., 0.], dtype=float32)
y: array([ 1., 1., 1.])
The data type is float32. Thus, I think it should be single precision problem (sgemv).
I guess it's worth cleaning the buffer for float64 even if it's not the cause of this failure?
dgemv it reads uninitialized allocation massively during this lapack test (dgemv_t.S and dgemv_n.S)
Looks like cgemv causes the error, not sgemv. I appled the following patch to kernel/x86_64/KERNEL.CORE2. Replace cgemv kernels by C kernels.
It passed the test.
diff --git a/kernel/x86_64/KERNEL.CORE2 b/kernel/x86_64/KERNEL.CORE2
index 867c941..375937f 100644
--- a/kernel/x86_64/KERNEL.CORE2
+++ b/kernel/x86_64/KERNEL.CORE2
@@ -58,3 +58,6 @@ ZTRSMKERNEL_RT = ztrsm_kernel_RT_2x2_core2.S
CGEMM3MKERNEL = zgemm3m_kernel_8x4_core2.S
ZGEMM3MKERNEL = zgemm3m_kernel_4x4_core2.S
+
+CGEMVNKERNEL = ../arm/zgemv_n.c
+CGEMVTKERNEL = ../arm/zgemv_t.c
@brada4 , could you write a utest for dgemv to catch this issue? I can add it to OpenBLAS/utest
.
Yes, tomorrow. What about Penryn Prescott Optreon_SSE3 (and Opteron 3dnow that I cannot test) ? (I mistyped Bulldozer, it is fine indeed as it fell back to Prescott)
@xianyi - are these kernels used in any other configurations? If so, how hard would it be to fix the problems with using badly-initialized memory?
Probably needs a seasoned assembler wizard (and a herd of sacrificial goats) to find the hidden flaw in Goto-sans heritage code.
@martin-frbg , @matthew-brett I already replaced it with C kernel. It should fix this issue and #695. I think the C kernel is OK for old architecture.
@xianyi - sorry not to be clear - but I wanted to check whethere there are any remaining configurations that will use the assembly kernels, or can / have they now be removed?
What performance cost do you think the C kernels will have compared to the assembly kernels?
I just remove cgemv_t.S
for older architecture and other kernels (e.g. gemm) are assembly.
For new architecture (e.g. sandy bridge, haswell and others), we already optimized cgemv_t
kernel instead of using cgemv_t.S
I am going to close this issue. It it fixed on develop branch.
Test script: https://gist.github.com/b015f546a29002b1b8f1