JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.71k stars 5.49k forks source link

sparse matrix \ causes gc segfault #10342

Closed tawheeler closed 9 years ago

tawheeler commented 9 years ago

I am running into a signal (11): Segmentation fault error when running M\Y for sparse matrix M and full vector Y.

n = 100000
Y = randn(n+1)

len = 3n+1
I = Array(Int, len)
J = Array(Int, len)
V = Array(Float64, len)

I[1], J[1], V[1] = 1, 1, 2.0
I[2], J[2], V[2] = n+1, n+1, 2.0

c = 2
for i = 2 : n
    c += 1
    I[c], J[c], V[c] = i, i, 4.0
end
for i = 1:n
    c += 1
    I[c], J[c], V[c] = i, i+1, 1.0
    c += 1
    I[c], J[c], V[c] = i+1, i, 1.0
end

M = sparse(I, J, V)

M\Y

Error Trace:

$ julia test.jl
signal (11): Segmentation fault
unknown function (ip: -502658307)
unknown function (ip: -502657939)
unknown function (ip: -502657939)
...
unknown function (ip: -502658572)
jl_gc_collect at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
allocobj at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
jl_alloc_array_1d at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
- at sparse/sparsematrix.jl:551
ishermitian at sparse/sparsematrix.jl:1924
factorize at linalg/cholmod.jl:1074
\ at linalg/generic.jl:235
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: -502744360)
...
unknown function (ip: -502678067)
jl_load at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
include at ./boot.jl:245
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
include_from_node1 at loading.jl:128
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
process_options at ./client.jl:285
_start at ./client.jl:354
jlcall__start_17150 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so (unknown line)
jl_apply_generic at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 4200623)
julia_trampoline at /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so (unknown line)
unknown function (ip: 4199613)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 4199667)
unknown function (ip: 0)
Segmentation fault (core dumped)

Version:

Julia Version 0.3.6
Commit a05f87b* (2015-01-08 22:33 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  WORD_SIZE: 64
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Thanks!

andreasnoack commented 9 years ago

Thanks for reporting this. However, I cannot reproduce this on neither Mac or Linux, 0.4 or 0.3.6.

tawheeler commented 9 years ago

The issue is recent; the problem did not occur about two weeks ago. Maybe BLAS or LAPACK changed? I tried reinstalling 0.3.6 but it did not fix the issue

tkelman commented 9 years ago

What distribution, and how do you have Julia installed? From source? Package manager?

tawheeler commented 9 years ago

Ubuntu 14.04, Julia installed via the stable PPA & Julia platform-specific instructions with apt-get

andreasnoack commented 9 years ago

Thanks. I've just been able to reproduce this with the release version. I'm looking into it.

tawheeler commented 9 years ago

Thank you!

andreasnoack commented 9 years ago

The minimal example to reproduce the segfault is

julia> Base.LinAlg.CHOLMOD.cmn(Int64);
julia>gc()

and it has probably to do with the comment in line 38 of cholmod.jl where it is stated that chm_com and chm_l_com have to be initialized at runtime. The problem is that they are not. If I reload linalg.jl and write

julia> LinAlg.CHOLMOD.cmn(Int64);
julia>gc()

there is no segfault. This is solved on 0.4 because these arrays are allocated on each call to CHOLMOD and I think we should just do the same on 0.3.7. A good question is why we haven't seen segfault long time ago.

A fix would be a little tricky to test because this is only happening on the repo versions of 0.3.x. @staticfloat would it be possible to build a test package of 0.3.7 after a fix has been pushed? (I'm going to bed now so it will only be fixed tomorrow.)

staticfloat commented 9 years ago

Sure, what platforms do you want the test package built for? OSX, Linux .tar.gz and Windows are easy. Ubuntu PPA is a little harder because it's published to everyone, so test builds are hard.

andreasnoack commented 9 years ago

@staticfloat It would probably be easiest with gene Linux builds. However, I cannot reproduce the segfault with the latest 0.3.6 tar.gz. It appears that it is build with the old SuiteSparse with version 2.1.2 of CHOLMOD whereas the Ubuntu Package is build with the new SuiteSparse 4.4.3 that has version 3.0.4 of CHOLMOD.

Could you make generic Linuz tar.gx. build with the new SuiteSparse. Then I can try to see if I can reproduce the segfault with that. Either as it is or by removing the libcholmod.so such that it fetches the old library in the system.

andreasnoack commented 9 years ago

@staticfloat The values returned from the functions in suitesparse_wrapper are wrong on Ubuntu 14.04 and 14.10 because an older libcholmod than was used when compiling suitesparse_wrapper is loaded. I think this is the reason for the segfault.

I plan to add a runtime check for the versions suitesparse_wrapper and libcholmod, but this will probably only change a segfault to an error for people who use your ppa. This is an improvement, but to really fix this we'd need that the suitesparse_wrapper version always match the loaded cholmod version. Is there a way to do this when packaging?

staticfloat commented 9 years ago

@andreasnoack Are you talking about the generic linux binaries or the PPA binaries here? I don't think the PPA binaries are built on a computer that has two versions of SuiteSparse on them, so I'm not sure what would cause what you're describing. In any case, I'll see if I can make the generic linux binaries build against a newer SuiteSparse.

tkelman commented 9 years ago

The problem is we're using different versions of suitesparse on release-0.3 vs master. I think the PPA upgraded its version of suitesparse even for release Julia.

At this stage I'd rather avoid working too hard to tweak suitesparse-related things on release-0.3...

staticfloat commented 9 years ago

Ah, I see. Is there a reason we should update release-0.3 to use SuiteSparse-4.4.3, or should I modify the ubuntu package to specifically request an older suitesparse version?

tkelman commented 9 years ago

You know debian packaging better than I do, but I suspect the latter might be easier? And if it fixes the bug then I'd vote for that option.

Generally we should probably be more careful about versions of dependencies for distribution packaging, since a lot of things can change underneath us that break assumptions that get compiled into the Julia system image. This would either be done by moving more checks to runtime, which will exhibit as errors until the system image can be rebuilt, or registering hooks to re-build the system image any time a package we depend on changes version. There's also the related issue that for Linux distro packaging we should be using the soname'd versions of various libraries.

staticfloat commented 9 years ago

I'm going to attempt the latter. If all goes well, we should have a new Julia build by tonight.

andreasnoack commented 9 years ago

The problem is that the versions of libsuitesparse_wrapper.so and libcholmod.so don't match. Right now it appears that libsuitesparse_wrapper.so (which is provided by the julia package) is build against CHOLMOD 3.0.4 (in SuiteSparse 4.4.3), but the dependency requirement for Julia is only CHOLMOD 2.1.2. I think this is causing the segfault in this issue.

matanyahorowitz commented 9 years ago

Hi there... for those running into this problem on Ubuntu 14.04, what would you suggest?

ViralBShah commented 9 years ago

Can you try the generic linux tarball from julialang.org/downloads? That should work.

matanyahorowitz commented 9 years ago

No problem with the generic tarball. New to Julia and loving it, thank you.

staticfloat commented 9 years ago

This should be fixed in the distribution packages now, please sudo apt-get update && sudo apt-get upgrade to get the newest julia. Your test script now works for me.

ViralBShah commented 9 years ago

@tanmaykm We may need to do this for the next JuliaBox refresh to get newer libraries.

andreasnoack commented 9 years ago

With @staticfloat's change and #10362 I consider this fixed.