OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.32k stars 1.49k forks source link

OpenBLAS bug with Numpy test on Core2 #783

Closed xianyi closed 8 years ago

xianyi commented 8 years ago

Test script: https://gist.github.com/b015f546a29002b1b8f1

brada4 commented 8 years ago

Penryn,Prescott - FAIL TOO Atom,Nehalem - GOOD Bulldozer, Operon_SSE3 - FAIL Piledriver - GOOD

xianyi commented 8 years ago

@brada4 , Thank you for the feedback.

@matthew-brett said the bug is reproduced in batch mode. If run the single test, it is ok.

brada4 commented 8 years ago

Must be cousin of #750 There are 29 unique contexts with 3800 calls total that read uninitialized allocated memory in Atom that are not in Core2 (courtesy valgrind)

xianyi commented 8 years ago

@brada4 @matthew-brett What's the wrong or correct output?

I got the following output. Does it pass the test? Or not?

OPENBLAS_CORETYPE=Core2 python -c 'import scipy.linalg; scipy.linalg.test()'
7
Running unit tests for scipy.linalg
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in /usr/local/lib/python2.7/dist-packages/numpy
SciPy version 0.17.0
SciPy is installed in /usr/local/lib/python2.7/dist-packages/scipy
Python version 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
nose version 1.3.7
....................................................................................................0-th dimension must be fixed to 3 but got 15
..0-th dimension must be fixed to 3 but got 5
..............0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
................................K...........................................................................................................................................................................................................................................................................................................K...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S..................K.........................................K...................................................................................
----------------------------------------------------------------------
Ran 1507 tests in 12.020s

OK (KNOWNFAIL=4, SKIP=1)
jeromerobert commented 8 years ago

I would say correct because I get:

OPENBLAS_CORETYPE=Core2 OPENBLAS_NUM_THREADS=12 python -c "import scipy.linalg; scipy.linalg.test()"
Running unit tests for scipy.linalg
NumPy version 1.10.4
NumPy relaxed strides checking option: False
NumPy is installed in /usr/lib/python2.7/dist-packages/numpy
SciPy version 0.17.0
SciPy is installed in /usr/lib/python2.7/dist-packages/scipy
Python version 2.7.11+ (default, Feb 22 2016, 16:38:42) [GCC 5.3.1 20160220]
nose version 1.3.7
....................................................................................................0-th dimension must be fixed to 3 but got 15
..0-th dimension must be fixed to 3 but got 5
..............0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
.0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
0-th dimension must be fixed to 4 but got 2
................................K.....................................................................................................................................................................................................................................................F.....................................................K...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S..................K.........................................K...................................................................................
======================================================================
FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py", line 658, in eigenhproblem_general
    assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 892, in assert_array_almost_equal
    precision=decimal)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 713, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 100.0%)
 x: array([ 0.,  0.,  0.], dtype=float32)
 y: array([ 1.,  1.,  1.])

----------------------------------------------------------------------
Ran 1507 tests in 12.147s

FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
matthew-brett commented 8 years ago

@jeromerobert - yes, you have the same output that I get, showing the failing test; @xianyi - you have the output when the test is passing. Are you running with:

docker run --rm -v $PWD:/io quay.io/pypa/manylinux1_x86_64 /io/replicate_core2_fail.sh

?

xianyi commented 8 years ago

@jeromerobert , @matthew-brett , Thank you for the answer.

I didn't try docker yet. I just ran the test on my Sandy Bridge machine. I will try to install the docker and test it.

matthew-brett commented 8 years ago

Here's a script (to run in docker) that replicates the test failure on latest Ubuntu, to remove the slightly weird initial build environment from the equation:

https://gist.github.com/073ab2730b60eda80f3d

brada4 commented 8 years ago

dgemv reads unitialized allocations a lot in failing sets of kernels (i am at 10th/29 contexts)

xianyi commented 8 years ago

I can run docker and reproduce this error. Try to replace OpenBLAS some functions with reference implementation to isolate the issue.

jeromerobert commented 8 years ago

I ran the test on an up to date Debian Stretch. The only difference between xianyi passing test is:

- Python version 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
+ Python version 2.7.11+ (default, Feb 22 2016, 16:38:42) [GCC 5.3.1 20160220]

The last ubuntu (Wily) have gcc 5.2.1 and python 2.7.9.

brada4 commented 8 years ago

I ran on openSuSE Leap 42.1 with pip27 / python 2.7.9/ install --user ... and CC=clang 3.7 (i tried gcc, omp/pthread/up builds, no difference) Also daxpy reads fresh allocations. 17/29

martin-frbg commented 8 years ago

There is an (unsolved) question on stackoverflow about this exact error - http://stackoverflow.com/questions/27104889 dates from november 2014 (python version 2.7.8, hardware not mentioned). Back then somebody suggested backwards- (and lapack-) incompatible changes to the normalization of eigenvectors in scipy 0.14 might be the culprit. At least it is safe to say that this is not a recent regression. If gemv is implicated in any way (is it, or was that just an unrelated observation ?) I wonder if it could be related to the weirdness in #695 (i.e. would resorting to the generic "C" implementation of the GEMV kernels solve this one too ?)

xianyi commented 8 years ago

I just replaced level 2 and 1 assembly kernels by C kernels for Core2 target. It passed the test.

brada4 commented 8 years ago

@martin-frbg i am chasing call chains that yield reading fresh allocations. Next step will be looking if that impacts the final math. @xianyi give me more hour. I will get the list of failing kernels. Maybe zeroing alloc can help compute-heaviest ones.

martin-frbg commented 8 years ago

jeromerobert appears to have a tentative workaround for the gemv functions in #746 (using calloc on the workspace array)

brada4 commented 8 years ago

dgemv_n.S:1942 is crossed 4 times during the test and introduces damage if the fresh allocation is pre-populated with non-zeroes. i.e it can fail in any of the cases. Brainier processors use brainier kernels that dont exhibit such weakness.

xianyi commented 8 years ago

@brada4

(mismatch 100.0%)
 x: array([ 0.,  0.,  0.], dtype=float32)
 y: array([ 1.,  1.,  1.])

The data type is float32. Thus, I think it should be single precision problem (sgemv).

matthew-brett commented 8 years ago

I guess it's worth cleaning the buffer for float64 even if it's not the cause of this failure?

brada4 commented 8 years ago

dgemv it reads uninitialized allocation massively during this lapack test (dgemv_t.S and dgemv_n.S)

xianyi commented 8 years ago

Looks like cgemv causes the error, not sgemv. I appled the following patch to kernel/x86_64/KERNEL.CORE2. Replace cgemv kernels by C kernels.

It passed the test.

diff --git a/kernel/x86_64/KERNEL.CORE2 b/kernel/x86_64/KERNEL.CORE2
index 867c941..375937f 100644
--- a/kernel/x86_64/KERNEL.CORE2
+++ b/kernel/x86_64/KERNEL.CORE2
@@ -58,3 +58,6 @@ ZTRSMKERNEL_RT        =  ztrsm_kernel_RT_2x2_core2.S
 CGEMM3MKERNEL    =  zgemm3m_kernel_8x4_core2.S
 ZGEMM3MKERNEL    =  zgemm3m_kernel_4x4_core2.S

+
+CGEMVNKERNEL = ../arm/zgemv_n.c
+CGEMVTKERNEL = ../arm/zgemv_t.c

@brada4 , could you write a utest for dgemv to catch this issue? I can add it to OpenBLAS/utest.

brada4 commented 8 years ago

Yes, tomorrow. What about Penryn Prescott Optreon_SSE3 (and Opteron 3dnow that I cannot test) ? (I mistyped Bulldozer, it is fine indeed as it fell back to Prescott)

matthew-brett commented 8 years ago

@xianyi - are these kernels used in any other configurations? If so, how hard would it be to fix the problems with using badly-initialized memory?

martin-frbg commented 8 years ago

695 links to a julia project bug with a list of CPU types affected (by that particular manifestation of the problem - cgemv behaviour being influenced by as-yet unidentified remnants of preceding calculations, apparently not curable by zeroing only the workspace array that gets passed in from C). This would appear to implicate every x86 architecture older than Intel Sandybridge.

Probably needs a seasoned assembler wizard (and a herd of sacrificial goats) to find the hidden flaw in Goto-sans heritage code.

xianyi commented 8 years ago

@martin-frbg , @matthew-brett I already replaced it with C kernel. It should fix this issue and #695. I think the C kernel is OK for old architecture.

matthew-brett commented 8 years ago

@xianyi - sorry not to be clear - but I wanted to check whethere there are any remaining configurations that will use the assembly kernels, or can / have they now be removed?

What performance cost do you think the C kernels will have compared to the assembly kernels?

xianyi commented 8 years ago

I just remove cgemv_t.S for older architecture and other kernels (e.g. gemm) are assembly.

For new architecture (e.g. sandy bridge, haswell and others), we already optimized cgemv_t kernel instead of using cgemv_t.S

xianyi commented 8 years ago

I am going to close this issue. It it fixed on develop branch.