JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.46k forks source link

LU decomposition crashes Windows for 35x35 matrices #2124

Closed andreasnoack closed 11 years ago

andreasnoack commented 11 years ago

(edit: now named lufact(rand(33,33)))

julia>lud(rand(33)) ok!

but (see also #1543)

julia>lud(rand(34))
0x065D2EE3 (0x0062F810 0x00000000 0x004FC9F8 0x0EBD01C0), LAPACKE_csyr_work() +
0xC579E3 bytes(s)
0x05ACFCC6 (0x0062F810 0x00000000 0x00562E78 0x0EBD01C0), LAPACKE_csyr_work() +
0x1547C6 bytes(s)
0x05ACFCC6 (0x0062F810 0x00000000 0x005C92F8 0x0EBD01C0), LAPACKE_csyr_work() +
0x1547C6 bytes(s)
0x05ACFCC6 (0x0062F810 0x00000000 0x0062F778 0x0EBD01C0), LAPACKE_csyr_work() +
0x1547C6 bytes(s)
0x05ACFCC6 (0x0062F810 0x00000000 0x00000000 0x0EBD01C0), LAPACKE_csyr_work() +
0x1547C6 bytes(s)
0x05346996 (0x04309F68 0x0062F914 0x00000001 0x0FD03900), dgetrf_() + 0x146 byte
s(s)
0x02DDD8A3 (0x04309F28 0x0062F94C 0x00000001 0x00000001) <unknown module>
0x02DDD809 (0x042896E0 0x0062F9CC 0x00000001 0x6F411541) <unknown module>
0x6F40B42D (0x037864C8 0x0062F9CC 0x00000001 0x6F411858), jl_apply_generic() + 0
x5D bytes(s)
0x6F43D684 (0x00000000 0x00000000 0x0062FA58 0x6F446FBB), jl_dump_function() + 0
xF64 bytes(s)
0x6F43D07B (0x0FCFE178 0x02543050 0x00000004 0x04099D30), jl_dump_function() + 0
x95B bytes(s)
0x6F4475CB (0x0FCFE0E8 0x0062FB20 0x00000002 0x025ADE20), jl_uncompress_ast() +
0x1AEB bytes(s)
0x6F40F08B (0x037DBB20 0x0062FC40 0x00000002 0x0253CFE8), jl_enter_handler() + 0
x18B bytes(s)
0x02DDCBC9 (0x0FCFE0E8 0x00000001 0x0062FCB8 0x6F40B42D) <unknown module>
0x02DDC99E (0x033D1F80 0x0062FCF4 0x00000002 0x0349E8B0) <unknown module>
0x6F40B42D (0x033D1F30 0x0062FCF4 0x00000002 0x02DA28FF), jl_apply_generic() + 0
x5D bytes(s)
0x02DA0C8E (0x03786E8C 0x025554F8 0x00000001 0x02F44FD8) <unknown module>
0x02DA08BE (0x00000000 0x00000000 0x0062FE28 0x6F40B42D) <unknown module>
0x02DA01CB (0x03551F40 0x00000000 0x00000000 0x00000000) <unknown module>
0x6F40B42D (0x03551EF0 0x00000000 0x00000000 0x77BC6C74), jl_apply_generic() + 0
x5D bytes(s)
0x00401888 (0x00000000 0x00BE6C8C 0x0062FFE0 0x00000004)
0x6F44068F (0x00000000 0x00BE6C8C 0x00401760 0x00BE6C20), julia_trampoline() + 0
x4F bytes(s)
0x00404755 (0x00000000 0x00BE6C8C 0x00BE3B90 0x00000000), jl_readBuffer() + 0x26
85 bytes(s)
0x004013EA (0x00000000 0x00000000 0x7EFDF000 0xC00000FD)
0x7D4E7D42 (0x004014D0 0x00000000 0x000000AA 0x0000000C), BaseProcessInitPostImp
ort() + 0x8D bytes(s)
ViralBShah commented 11 years ago

Cc: @xianyi

Keno commented 11 years ago

Might be a stack issue. Wouldn't be surprised.

ViralBShah commented 11 years ago

Is it an openblas bug, or something in our windows port?

Keno commented 11 years ago

Can't tell yet. I can give you a windows VM if you want to debug.

andreasnoack commented 11 years ago

I should add that this seems to be limited to older Windows machines. The example is from Windows Server 2003.

Keno commented 11 years ago

Don't have a test machine for that yet. Will set up a couple of VMs for it on julia.mit.edu

andreasnoack commented 11 years ago

I am able to get the same crash on a Windows Server 2008 but to do so I need a 65x65 matrix. I cannot crash Julia on Windows 8.

xianyi commented 11 years ago

What's your compiler version? GCC 4.7?

Xianyi

alanedelman commented 11 years ago

+1 Running on vista lu(randn(33,33)) is ok lu(randn(34,34)) breaks

ViralBShah commented 11 years ago

@xianyi Is there a way to get openblas to work reliably on Windows? Any specific compiler versions that you recommend?

@vtjnash Do you think we can build julia 0.1 with ATLAS as a backup, until some of these things are sorted out?

vtjnash commented 11 years ago

can someone tell me how to run the equivalent of lud(rand(33)) on the current version of julia? i'll bundle something that works when I make the 0.1 binaries for windows. however, this shouldn't be critical for the ubuntu release.

ViralBShah commented 11 years ago

lufact(rand(33,33))

xianyi commented 11 years ago

Hi @ViralBShah ,

We are in Chinese New Year holiday. I think we can address this issue next week.

Xianyi

ViralBShah commented 11 years ago

Ok. Have fun! Let me know if I should file this as an issue on openblas.

ViralBShah commented 11 years ago

I think we should ship the Windows version with Reference BLAS, if we can't get ATLAS working in the meanwhile, and until OpenBLAS can be stabilized.

ViralBShah commented 11 years ago

@andreasnoackjensen We should probably add some of these windows crashes as tests in test/linalg.jl once we resolve them.

xianyi commented 11 years ago

Hi all,

I don't know why it calls csyr in lufact(rand(33,33)) . I thinks it is the double precision real matrix.

I just uploaded a simple dgetrf sample to gist https://gist.github.com/xianyi/4771129 It works fine with OpenBLAS develop branch (gcc-4.7) on my Win7 64-bit box.

Xianyi

andreasnoack commented 11 years ago

No it is not so obvious why csyr is called. However, the problem seems again to be be related to multithreading. If I set the number of threads to one I don't get the error.

xianyi commented 11 years ago

Hi @andreasnoackjensen ,

Is it 32 bit or 64 bit? Could you try OpenBLAS develop branch?

Could you try my dgetrf test https://gist.github.com/xianyi/4771129 ?

Thank you

Xianyi

andreasnoack commented 11 years ago

Hi @xianyi,

It was on a Windows Server 2008 64 bit machine, but I don't know much about the Windows build of Julia. Therefore I cannot try a build with the develop branch. Maybe @loladiro and @vtjnash can help here. I'll see if I can run your example, but I don't have access to a Windows machine with privileges to install programs.

vtjnash commented 11 years ago

i added comments to xianyi's gist.

current workaround for julia may be to add export OPENBLAS_NUM_THREADS=1 to prepare-julia-env.bat

vtjnash commented 11 years ago

@xianyi I've narrowed this down to the stack being corrupted by the line in your gist: LAPACK_dgetrf(&N, &N, m, &LDA,ipiv, &info); somewhere in _zpotrf. the apparent stack trace is

Program received signal SIGSEGV, Segmentation fault.
0x6d7e9243 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
(gdb) bt
#0  0x6d7e9243 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#1  0x6cc2c9f6 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#2  0x6cc2c9f6 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#3  0x6cc2c9f6 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#4  0x6cc2c9f6 in zupmtr_ () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#5  0x6c4d6996 in libopenblas!DLANSB () from c:\users\jameson\desktop\julia-64966d6e8c\libopenblas.dll
#6  0x0028fdf0 in ?? ()
#7  0x004013fa in __tmainCRTStartup ()
#8  0x749033aa in KERNEL32!BaseCleanupAppcompatCacheSupport () from C:\Windows\syswow64\kernel32.dll
#9  0x0028ffd4 in ?? ()
#10 0x77149ef2 in ntdll!RtlpNtSetValueKey () from C:\Windows\system32\ntdll.dll
#11 0x7efde000 in ?? ()
#12 0x77149ec5 in ntdll!RtlpNtSetValueKey () from C:\Windows\system32\ntdll.dll
#13 0x004014e0 in WinMainCRTStartup ()
#14 0x7efde000 in ?? ()
#15 0x00000000 in ?? ()
0x6d7e9243 in zupmtr_ () from c:\users\jameson\desktop\julia-
(gdb) info reg
eax            0x3440   13376
ecx            0x92b80  600960
edx            0x8      8
ebx            0x8      8
esp            0xf6b74  0xf6b74
ebp            0xf6bb8  0xf6bb8
esi            0x28fdf0 2686448
edi            0xffffc000       -16384
eip            0x6d7e9243       0x6d7e9243 <zupmtr_+13701139>
eflags         0x10202  [ IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x53     83
gs             0x2b     43
ViralBShah commented 11 years ago

Does this happen only in LU, or does it happen for other decompositions too?

andreasnoack commented 11 years ago

I have tested the other factorizations and the problem seems to be for LU only. However, that includes the solution of a general linear system which also crashes Julia.

ViralBShah commented 11 years ago

@vtjnash Lets set number of threads to 1 on windows if that will solve the immediate release issue.

xianyi commented 11 years ago

@vtjnash ,

I also added the comment in my gist. You narrowed down this issue to dgetrf function. Do you include cblas.h and lapacke.h?

Xianyi

ViralBShah commented 11 years ago

CBLAS does get linked into the openblas used by julia.

StefanKarpinski commented 11 years ago

Bumping to post 0.1.

ViralBShah commented 11 years ago

@xianyi Would it be possible to fix this in a few days? If so, we can build julia windows binaries with openblas now that we have released 0.1.

xianyi commented 11 years ago

@zchothia Could you investigate this issue? Thank you.

xianyi commented 11 years ago

Hi @vtjnash ,

I read your comments in my gist. However, when I built OpenBLAS on Linux and test_dgetrf on Windows, I didn't meet the SEGFAULT bug on Windows.

What's the i686-w64-mingw32-gcc version on Linux and gcc version on Windows?

Thank you

Xianyi

vtjnash commented 11 years ago
$ i686-w64-mingw32-gcc --version
i686-w64-mingw32-gcc (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

built with max OPENBLAS_NUM_THREADS of 80

tested with

$ /c/MinGW64/bin/gcc --version
gcc.exe (Built by MinGW-builds project) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

$ gcc --version
gcc.exe (GCC) 4.6.1
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and (which is really the same as the first one):

$ /c/MinGW64/bin/x86_64-w64-mingw32-gcc --version -m32
x86_64-w64-mingw32-gcc.exe (Built by MinGW-builds project) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Oh, and my machine is a VMware instance with 2 processors (sometimes 4) running on a Core i7 2620m with 4 processors (all x86_64 / 64-bit).

are any of these make flags for openblas potentially at fault (or insufficient)?

make CC="i686-w64-mingw32-gcc" FC="i686-w64-mingw32-gfortran" RANLIB="i686-w64-mingw32-ranlib" \
CFLAGS="-g" FFLAGS="-g -O2 " USE_THREAD=1 TARGET= DYNAMIC_ARCH=1 OSNAME=WINNT \
CROSS=1 BINARY=32
xianyi commented 11 years ago

Your i686-w64-mingw32-gcc is 4.6 version. Did you use gcc 4.6 on Windows? I remember that 4.6 and 4.7 have the different calling conventions on Windows.

Xianyi

vtjnash commented 11 years ago

IIUC, It appears that only the calling convention of C++11 changed: http://gcc.gnu.org/gcc-4.7/changes.html. I tried all three compilers mentioned above (4.6.1-i386, 4.7.2-i386, 4.7.2-x86_6) I am putting together a Virtual Machine for more testing.

xianyi commented 11 years ago

Hi @vtjnash ,

Please give me the access to the VM. I cannot reproduce this bug on my machine :(

Xianyi

vtjnash commented 11 years ago

@xianyi I haven't started it yet (I think I need to find my windows install disk). However, I just identified the problem as stack overflow. The default stack on windows is 1MB, increasing it to 16MB fixes the problem (-Wl,--stack,16777216). Any idea what a good size would be and why this was a problem? (default stack on linux is 8MB, IIRC)

JeffBezanson commented 11 years ago

Julia itself can use quite a bit of stack space; can we bump the default to 8MB on windows (if that's enough to fix this)?

vtjnash commented 11 years ago

16MB was enough to bump the max number of openblas threads up to somewhere between 10 and 60, then we run into some other segfault (which appears to be caused by a null pointer)

vtjnash commented 11 years ago

note: fixing #1971 converted this segfault into a julia stack overflow exception for OPENBLAS_NUM_THREADS<30 (or so) at which point it turns into a MemoryError (or an openblas/lapack crash?)