JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.72k stars 5.48k forks source link

Segmentation fault while fetching for product of sparse qrfact and dense matrix #14134

Closed aueelis closed 8 years ago

aueelis commented 8 years ago

I was comparing LU- and QR-Factorization and wanted to implement parallelism. I noticed that fetching the QR-factorized matrix fa at the same time as the identity matrix b results in a segmentation fault, while the same works fine with lufact. Additionally, serial code works, too.

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> @everywhere n = 5

julia> a = @spawn sprand(n,n,0.99)
RemoteRef{Channel{Any}}(2,1,5)

julia> b = @spawn eye(n)
RemoteRef{Channel{Any}}(3,1,6)

julia> fa = @spawn qrfact(fetch(a))
RemoteRef{Channel{Any}}(2,1,7)

julia> c = @spawn fetch(fa) \ fetch(b)
RemoteRef{Channel{Any}}(3,1,8)

Error:

signal (11): Segmentation fault
size at abstractarray.jl:53
Worker 3 terminated.
ERROR (unhandled task failure): EOFError: read end of file
 in read at stream.jl:911
 in message_handler_loop at multi.jl:863
 in process_tcp_streams at multi.jl:852
 in anonymous at task.jl:63

julia> versioninfo()
Julia Version 0.4.1
Commit cbe1bee (2015-11-08 10:33 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libm
  LLVM: libLLVM-3.3
andreasnoack commented 8 years ago

I think there are two issues here. The issue that triggers the segfault is that fa is moved from one process to another and this won't work because a sparse QR object is just a pointer to C struct handled by SPQR.

When using @spawn, there is an ambiguity about which process to use when executing fetch(fa) \ fetch(b) when fa and b are on two different processes. From the RemoteRefs, you can see that fa is on process 2 and b and c are on process 3. If you create c with

julia> c = @spawnat fa.where fetch(fa) \ fetch(b)
Future(2,1,10,Nullable{Any}())

julia> fetch(c)
5x5 Array{Float64,2}:
  5.74502  -8.21385  -2.35193    1.60267  -1.30705 
  4.95325  -4.9708   -1.94713   -1.55292   1.59012 
 -5.9495    7.69385   3.70573   -1.63633   1.22123 
 -1.39242   1.21979  -0.656823   1.70116  -0.257719
 -2.34509   4.46599   0.60584    0.78298  -1.03139 

it works. This is tricky to fix because the @spawn macro doesn't know if an object can be moved or not. However, you should have received a normal error instead of a segfault.

The question is then why it segfaults instead of giving an error. This also happens on 0.5. @amitmurthy @yuyichao any ideas?

andreasnoack commented 8 years ago

I've figured out what is happening here. It's \ on the the SPQR object when the pointer has been zeroed because of the serialization. We'll probably have to check the pointer on entry for all exported functions in SuiteSparse.