Closed rened closed 9 years ago
Isn't this a is a dup of #12381?
i dont' think so - the error occurs at a different place in the C code and the workaround in https://github.com/JuliaLang/julia/issues/12381#issuecomment-127182636 (adding sleep(0.5)
) does not help.
This time the git bisect at least seems to point to a reasonable commit for causing this.
ps: also happens for using
instead of importall
.
This error occurs on OSX - I can't reproduce it on Linux.
One last comment: it seems that @everywhere
is no longer necessary for imports anyway? Everything works nicely when I omit @everywhere
.
sure the code will get loaded on the master process, but none of the workers should load the code (this also assumes a shared file system).
I get a similar error, but if I precompile the module with compilecache
, then @everywhere using
works correctly.
@jakebolewski I thought so too, therefore the @everywhere
. But this works (which I think did not work in the 0.3 / early 0.4 days):
julia> addprocs(3)
3-element Array{Int64,1}:
2
3
4
julia> using JSON
julia> @fetchfrom 2 json(1)
"1"
So while defining a new function needs to look like @everywhere func() = "hi"
, otherwise it is not visible on the workers, loading modules seems to be across all processes now.
So basically, everything is usable, but it would still be good not to crash on a @everywhere
import statement which is a no-op anyway?
I can see it on Linux in the current master.
WARNING: replacing module funcd
WARNING: replacing module funcd
WARNING: Method definition range(Any) in module funcd at /tmp/funcd.jl:10 overwritten in module funcd at /tmp/funcd.jl:10.
WARNING: replacing module funcd
WARNING: Method definition range(Any) in module funcd at /tmp/funcd.jl:10 overwritten in module funcd at /tmp/funcd.jl:10.
signal (11): Segmentation fault
jl_object_id at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 0x7f514d94cc78)
unknown function (ip: 0x7f514d952bb5)
unknown function (ip: 0x7f514d95b58c)
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
serialize at serialize.jl:414
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
serialize at serialize.jl:414
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
serialize at serialize.jl:414
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
serialize at serialize.jl:414
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
serialize at serialize.jl:414
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
send_msg_ at multi.jl:222
send_msg_now at multi.jl:173
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
deliver_result at multi.jl:805
jlcall_deliver_result_21311 at (unknown line)
jl_apply_generic at /home/amitm/Work/julia/julia/usr/bin/../lib/libjulia.so (unknown line)
anonymous at task.jl:890
unknown function (ip: 0x7f514d9bf560)
unknown function (ip: (nil))
I suspect it is the same as #12381, specifically https://github.com/JuliaLang/julia/issues/12381#issuecomment-126816290
Ok, true. So the only (perhaps) valueable info from this issue is that it does not occur before 7207a8a. But then again, perhaps this bisect is red herring, as well. Please feel free to close this issue when you think #12381 is enough for tracking this.
Replacing addprocs(3)
with addprocs(2)
results in the following error printed (no segfault in this case):
WARNING: replacing module funcd
WARNING: replacing module funcd
WARNING: Method definition range(Any) in module funcd at /tmp/funcd.jl:10 overwritten in module funcd at /tmp/funcd.jl:10.
ERROR: LoadError: On worker 3:
LoadError("/tmp/funcd.jl",7,TypeError(:getfield,"",DataType,Any[:( # serialize.jl, line 400:),NewvarNode(:t),NewvarNode(:nf),NewvarNode(symbol("#s332")),:(tag = (Base.Serializer.sertag)(x::TypeError)::Int32),:( # line 401:),
:(unless (Base.slt_int)(0,(Base.box)(Int64,(Base.sext_int)(Int64,tag::Int32))::Int64)::Bool goto 0),:( # line 402:),:(GenSym(2) = (top(getfield))
(s::SerializationState{TCPSocket},:io)::TCPSocket),
:(unless (Base.slt_int)(tag::Int32,Base.Serializer.VALUE_TAGS)::Bool goto 15),:((Base.write)(GenSym(2),(top(vect))((Base.box)(UInt8,(Base.checked_trunc_uint)(UInt8,0))::UInt8)::Array{UInt8,1})::Int64),:(goto 15),:(15: ),:(return (Base.write)(GenSym(2),(top(vect))((Base.box)(UInt8,(Base.checked_trunc_uint)(UInt8,tag::Int32))::UInt8)::Array{UInt8,1})::Int64),:(0: ),:( # line 404:),:(t = (Base.Serializer.typeof)(x::TypeError)::Type{TypeError}),:( # line 405:),:(nf = (Base.Serializer.nfields)(t::Type{TypeError})::Int64),:( # line 406:),
:(unless nf::Int64 === 0::Bool goto 1),:(#s332 = (Base.slt_int)(0,(Base.box)(Int64,(Base.sext_int)(Int64,(top(getfield))(t::Type{TypeError},:size)::Int32))::Int64)::Bool),:(goto 2),:(1: ),:(#s332 = false),:(2: ),
:(unless #s332::Bool goto 3),:( # line 407:),:((Base.Serializer.serialize_type)
(s::SerializationState{TCPSocket},t::Type{TypeError})::Union{Int64,Void}),:( # line 408:),:(GenSym(3) = (top(getfield))
(s::SerializationState{TCPSocket},:io)::TCPSocket),:(return (Base.throw)($(Expr(:new, :((top(getfield))(Base,:MethodError)::Type{MethodError}), :(Base.write), :((top(tuple))(GenSym(3),x::TypeError)::Tuple{TCPSocket,TypeError}))))::Union{}),:(goto 12),:(3: ),:( # line 410:),
:(unless (top(getfield))(t::Type{TypeError},:mutable)::Bool goto 5),
:(unless (Base.Serializer.serialize_cycle)
(s::SerializationState{TCPSocket},x::TypeError)::Bool goto 4),:(return),:(4: ),:(goto 5),:(5: ),:( # line 411:),:((Base.Serializer.serialize_type)
(s::SerializationState{TCPSocket},t::Type{TypeError})::Union{Int64,Void}),:( # line 412:),:(GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,nf::Int64)::Bool,nf::Int64,(Base.box)(Int64,(Base.sub_int)(1,1))::Int64)::Int64)))),:(#s333 = (top(getfield))(GenSym(0),:start)::Int64),
:(unless (Base.box)(Base.Bool,(Base.not_int)(#s333::Int64 === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Int64::Bool))::Bool goto 7),:(8: ),:(GenSym(5) = #s333::Int64),:(GenSym(6) = (Base.box)(Base.Int,(Base.add_int)(#s333::Int64,1))::Int64),:(i = GenSym(5)),:(#s333 = GenSym(6)),:( # line 413:),
:(unless (Base.Serializer.isdefined)(x::TypeError,i::Int64)::Bool goto 10),:( # line 414:),:((Base.Serializer.serialize)
(s::SerializationState{TCPSocket},(Base.Serializer.getfield)(x::TypeError,i::Int64))),:(goto 11),:(10: ),:( # line 416:),:(GenSym(4) = (top(getfield))
(s::SerializationState{TCPSocket},:io)::TCPSocket),:((Base.write)(GenSym(4),(top(vect))((Base.box)(UInt8,(Base.checked_trunc_uint)(UInt8,Base.Serializer.UNDEFREF_TAG))::UInt8)::Array{UInt8,1})::Int64),:(11: ),:(9: ),
:(unless (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s333::Int64 === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Int64::Bool))::Bool))::Bool goto 8),:(7: ),:(6: ),:(return),:(12: )]))
in include_string at loading.jl:225
in include_from_node1 at ./loading.jl:266
in require at ./loading.jl:202
in eval at sysimg.jl:14
in anonymous at multi.jl:1349
in anonymous at multi.jl:889
in run_work_thunk at multi.jl:642
in anonymous at task.jl:889
in remotecall_fetch at multi.jl:728
in anonymous at task.jl:447
in sync_end at ./task.jl:413
in anonymous at multi.jl:422
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:263
in process_options at ./client.jl:308
in _start at ./client.jl:411
while loading /tmp/run.jl, in expression starting on line 3
I don't know how to interpret it. Does it help in identifying the cause of the segfault?
FWIW, this is Linux on a macbookpro, so maybe the segfault has some relation to the hardware too?
System: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
mine is
System: Darwin (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
This warning - "WARNING: replacing module funcd " means that it is being loaded twice....that is probably a pointer to what is going wrong.
@amitmurthy I believe each import statement is executed on all workers? using X
on master loads the package on all workers. The redundant @everywhere
triggers loading once from each worker (in turn actually loading on all other workers as well). @everywhere
seems to be completely redundant (and racy) for importing?
Ah! OK.
I can no longer reproduce this using current master (4d8ca6b), neither on OSX nor Linux.
It was never segfaulting for me, but with the very latest master (f3217a8) I still get similar exceptions when trying to do @everywhere
on 12 workers:
ERROR: On worker 5:
LoadError("<...>",61,LoadError("<...>",4,LoadError("<...>",4,UndefVarError(:<...>))))
in include_string at loading.jl:226
in include_from_node1 at ./loading.jl:267
in require at ./loading.jl:203
in include_string at loading.jl:226
in include_from_node1 at ./loading.jl:267
in anonymous at no file:28
in include_string at loading.jl:226
in include_from_node1 at ./loading.jl:267
in eval at ./sysimg.jl:14
in anonymous at multi.jl:1348
in anonymous at multi.jl:889
in run_work_thunk at multi.jl:642
in anonymous at task.jl:889
in remotecall_fetch at multi.jl:728
in remotecall_fetch at multi.jl:731
in anonymous at multi.jl:1350
When using the following code (in a file
funcd.jl
):by running it with
julia run.jl
with arun.jl
file containing:results in a segfault:
A git bisect points to 7207a8a43e076576d6d6a6161ac75d2ae3391a6e
When executing the following code directly (i.e. in the REPL):
the code passes. cc @amitmurthy