JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.76k stars 5.49k forks source link

Crashing with @everywhere using #12381

Closed sbromberger closed 7 years ago

sbromberger commented 9 years ago

Per https://groups.google.com/d/msg/julia-users/FjGXSTzvfmc/j0ZDG629IwAJ

seth@schroeder:~/dev/julia/wip/LightGraphs.jl$  julia -p 4
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6371 (2015-07-29 17:45 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6fafbbc (0 days old master)
|__/                   |  x86_64-apple-darwin14.4.0

julia> @everywhere using LightGraphs
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
exception on 4:
signal (11): Segmentation fault: 11
ptrhash_peek_bp at /Users/seth/dev/julia/julia/src/support/ptrhash.c:26
jl_get_binding_ at /Users/seth/dev/julia/julia/src/module.c:172
jl_get_binding at /Users/seth/dev/julia/julia/src/module.c:406
eval at /Users/seth/dev/julia/julia/src/interpreter.c:119
fl_invoke_julia_macro at /Users/seth/dev/julia/julia/src/ast.c:76
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1276
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_map1 at /Users/seth/dev/julia/julia/src/flisp/flisp.c:2269
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1226
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_map1 at /Users/seth/dev/julia/julia/src/flisp/flisp.c:2269
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1226
do_trycatch at /Users/seth/dev/julia/julia/src/flisp/flisp.c:950
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1856
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:774
jl_expand at /Users/seth/dev/julia/julia/src/ast.c:581
jl_toplevel_eval_flex at /Users/seth/dev/julia/julia/src/toplevel.c:486
jl_eval_module_expr at /Users/seth/dev/julia/julia/src/toplevel.c:166
jl_parse_eval_all at /Users/seth/dev/julia/julia/src/toplevel.c:574
jl_load_file_string at /Users/seth/dev/julia/julia/src/ast.c:573
include_string at loading.jl:158
jl_apply at /Users/seth/dev/julia/julia/src/gf.c:1658
include_from_node1 at /usr/local/julia-latest/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/seth/dev/julia/julia/src/interpreter.c:55
eval at /Users/seth/dev/julia/julia/src/interpreter.c:212
jl_toplevel_eval_flex at /Users/seth/dev/julia/julia/src/toplevel.c:524
jl_toplevel_eval_in at /Users/seth/dev/julia/julia/src/builtins.c:552
eval at sysimg.jl:14
anonymous at multi.jl:1303
jl_apply at /Users/seth/dev/julia/julia/src/./julia.h:1262
anonymous at multi.jl:877
run_work_thunk at multi.jl:619
run_work_thunk at multi.jl:628
jlcall_run_work_thunk_21133 at  (unknown line)
jl_apply at /Users/seth/dev/julia/julia/src/gf.c:1658
anonymous at task.jl:11
jl_apply at /Users/seth/dev/julia/julia/src/task.c:233

signal (11): Segmentation fault: 11
ptrhash_peek_bp at /Users/seth/dev/julia/julia/src/support/ptrhash.c:26
jl_get_binding_ at /Users/seth/dev/julia/julia/src/module.c:172
jl_get_binding at /Users/seth/dev/julia/julia/src/module.c:406
eval at /Users/seth/dev/julia/julia/src/interpreter.c:119
fl_invoke_julia_macro at /Users/seth/dev/julia/julia/src/ast.c:76
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1276
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_map1 at /Users/seth/dev/julia/julia/src/flisp/flisp.c:2269
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1226
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_map1 at /Users/seth/dev/julia/julia/src/flisp/flisp.c:2269
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1226
do_trycatch at /Users/seth/dev/julia/julia/src/flisp/flisp.c:950
apply_cl at /Users/seth/dev/julia/julia/src/flisp/flisp.c:1856
_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:729
fl_applyn at /Users/seth/dev/julia/julia/src/flisp/flisp.c:774
jl_expand at /Users/seth/dev/julia/julia/src/ast.c:581
jl_toplevel_eval_flex at /Users/seth/dev/julia/julia/src/toplevel.c:486
jl_eval_module_expr at /Users/seth/dev/julia/julia/src/toplevel.c:166
jl_parse_eval_all at /Users/seth/dev/julia/julia/src/toplevel.c:574
jl_load_file_string at /Users/seth/dev/julia/julia/src/ast.c:573
include_string at loading.jl:158
jl_apply at /Users/seth/dev/julia/julia/src/gf.c:1658
include_from_node1 at /usr/local/julia-latest/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/seth/dev/julia/julia/src/interpreter.c:55
eval at /Users/seth/dev/julia/julia/src/interpreter.c:212
jl_toplevel_eval_flex at /Users/seth/dev/julia/julia/src/toplevel.c:524
jl_toplevel_eval_in at /Users/seth/dev/julia/julia/src/builtins.c:552
eval at sysimg.jl:14
jl_apply at /Users/seth/dev/julia/julia/src/gf.c:1658
anonymous at multi.jl:1303
jl_apply at /Users/seth/dev/julia/julia/src/./julia.h:1262
anonymous at multi.jl:877
run_work_thunk at multi.jl:619
run_work_thunk at multi.jl:628
jlcall_run_work_thunk_21054 at  (unknown line)
jl_apply at /Users/seth/dev/julia/julia/src/gf.c:1658
anonymous at task.jl:11
jl_apply at /Users/seth/dev/julia/julia/src/task.c:233
ERROR: LoadError: ReadOnlyMemoryError()
 in include_string at loading.jl:158
 in include_from_node1 at /usr/local/julia-latest/lib/julia/sys.dylib
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1303
 in anonymous at multi.jl:877
 in run_work_thunk at multi.jl:619
 in run_work_thunk at multi.jl:628
 in anonymous at task.jl:11
while loading /Users/seth/.julia/v0.4/LightGraphs/src/LightGraphs.jl, in expression starting on line 6
exception on 3: ERROR: LoadError: ReadOnlyMemoryError()
 in include_string at loading.jl:158
 in include_from_node1 at /usr/local/julia-latest/lib/julia/sys.dylib
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1303
 in anonymous at multi.jl:877
 in run_work_thunk at multi.jl:619
 in run_work_thunk at multi.jl:628
 in anonymous at task.jl:11
while loading /Users/seth/.julia/v0.4/LightGraphs/src/LightGraphs.jl, in expression starting on line 6
Worker 5 terminated.
ERROR (unhandled task failure): ProcessExitedException()
 in yieldto at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib (repeats 2 times)
 in wait_full at /usr/local/julia-latest/lib/julia/sys.dylib
 in remotecall_fetch at multi.jl:702
 in remotecall_fetch at multi.jl:707
 in anonymous at task.jl:369
ERROR (unhandled task failure): EOFError: read end of file
 in yieldto at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib (repeats 2 times)
 in wait_full at /usr/local/julia-latest/lib/julia/sys.dylib
 in remotecall_fetch at multi.jl:702
 in remotecall_fetch at multi.jl:707
 in anonymous at task.jl:369
Worker 2 terminated.exception on ERROR (unhandled task failure): ProcessExitedException()
 in yieldto at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib (repeats 2 times)
 in wait_full at /usr/local/julia-latest/lib/julia/sys.dylib
 in remotecall_fetch at multi.jl:702
 in remotecall_fetch at multi.jl:707
 in anonymous at task.jl:369

1: ERROR (unhandled task failure): EOFError: read end of file
 in yieldto at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib (repeats 2 times)
 in wait_full at /usr/local/julia-latest/lib/julia/sys.dylib
 in remotecall_fetch at multi.jl:702
 in call_on_owner at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib
 in require at /usr/local/julia-latest/lib/julia/sys.dylib
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1324
 in run_work_thunk at multi.jl:619
 in remotecall_fetch at multi.jl:692
 in remotecall_fetch at multi.jl:707
 in anonymous at task.jl:369
ERROR: ProcessExitedException()
 in yieldto at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib (repeats 2 times)
 in wait_full at /usr/local/julia-latest/lib/julia/sys.dylib
 in remotecall_fetch at multi.jl:702
 in call_on_owner at /usr/local/julia-latest/lib/julia/sys.dylib
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib
 in require at /usr/local/julia-latest/lib/julia/sys.dylib
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1324
 in run_work_thunk at multi.jl:619
 in remotecall_fetch at multi.jl:692
 in remotecall_fetch at multi.jl:707
 in anonymous at task.jl:369
ERROR: ProcessExitedException()
 in wait at /usr/local/julia-latest/lib/julia/sys.dylib
 in sync_end at /usr/local/julia-latest/lib/julia/sys.dylib
 in anonymous at multi.jl:348

This works on

0.4.0-dev+5008 (2015-05-26 16:08 UTC) Commit 0855ec9

Next step: git bisect. Stand by.

sbromberger commented 9 years ago

... and bisect is broken:

error: Your local changes to the following files would be overwritten by checkout:
    CMakeLists.txt
    src/openssl_stream.c
Please, commit your changes or stash them before you can switch branches.
Aborting
make[1]: *** [libgit2/CMakeLists.txt] Error 1
make: *** [julia-deps] Error 2

Will try some brute-force.

sbromberger commented 9 years ago

No crashes, but errors here:

 | | |_| | | | (_| |  |  Version 0.4.0-dev+6033 (2015-07-17 02:56 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit efcc709 (12 days old master)
|__/                   |  x86_64-apple-darwin14.4.0

julia> @everywhere using LightGraphs
exception on 1: exception on 3: exception on 5: exception on 4: exception on 2: ERROR: MethodError: `base_include` has no method matching base_include(::UTF8String, ::ASCIIString, ::Tuple{Int64,ASCIIString})
Closest candidates are:
  base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString, ::Any)
  base_include(::AbstractString, ::Any)
  base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString)
  ...
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1297
 in run_work_thunk at multi.jl:584
 in run_work_thunk at multi.jl:593
 in anonymous at task.jl:8

I will assume that this is a separate issue and will attempt to pinpoint the version that crashes.

jakebolewski commented 9 years ago

did you precompile this package?

sbromberger commented 9 years ago

@jakebolewski at one time, yes, but ~/.julia/lib/v0.4 is currently empty.

sbromberger commented 9 years ago

According to bisect results,

e8a1c7440f47707be3329775fac91f0c4bf9c27d is the first bad commit
commit e8a1c7440f47707be3329775fac91f0c4bf9c27d
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Jul 11 10:13:49 2015 -0500

    Add missing base_include method

    This fixes errors that crop up with multiple workers, e.g.,
    ERROR: MethodError: `base_include` has no method matching base_include(::ASCIIString, ::ASCIIString, ::Tuple{Int64,ASCIIString})
    Closest candidates are:
      base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString, ::Any)
      base_include(::AbstractString, ::Any)
      base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString)
      ...
     in eval at sysimg.jl:14
     in anonymous at multi.jl:1303
     in run_work_thunk at multi.jl:584
    ...

    Perhaps you'd prefer a call-site fix?

:040000 040000 dc8f6d703af0bc05b1dc811013d73a713fdf05dc 21a496ea73cbb8fcedeecb601d7a978e9067e283 M  base

Validating that the previous version actually works... there were several errors encountered throughout.

sbromberger commented 9 years ago

The version prior to bisect's "first bad commit" turns out to be https://github.com/JuliaLang/julia/issues/12381#issuecomment-126144754 so it looks like Tim's commit fixed that bug but may have uncovered the crash bug.

cc @timholy

timholy commented 9 years ago

There doesn't appear to be any way e8a1c7440f47707be3329775fac91f0c4bf9c27d is the real culprit. CC @vtjnash?

tkelman commented 9 years ago

re: https://github.com/JuliaLang/julia/issues/12381#issuecomment-126144337, if you go back far enough that dependencies changed versions you'll often need to do make -C deps distclean-libgit2, or similar for pcre. Make sure you're doing make cleanall at each step of bisect just to be sure (the only deps that cleanall deletes by default are small ones).

jlapeyre commented 9 years ago

Following is with no cleaning before rebuild. I get different behavior on two machines

Julia Version 0.4.0-dev+6033
Commit efcc709* (2015-07-17 02:56 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
> julia -p 4

Hangs (for at least several minutes) with:

ERROR: AssertionError: 
 in init_worker at ./multi.jl:1051
 in start_worker at multi.jl:964
 in process_options at ./client.jl:265
 in _start at ./client.jl:411

Different machine

Julia Version 0.4.0-dev+6033
Commit efcc709* (2015-07-17 02:56 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

> julia -p 4
> @everywhere using PowerSeries
exception on 1: exception on 4: exception on 2: ERROR: MethodError: `base_include` has no method matching base_include(::ASCIIString, ::ASCIIString, ::Tuple{Int64,ASCIIString})
Closest candidates are:
  base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString, ::Any)
  base_include(::AbstractString, ::Any)
  base_include(::Nullable{Union{UTF8String,ASCIIString}}, ::AbstractString)
  ...
 in eval at sysimg.jl:14
 in anonymous at multi.jl:1297
 in run_work_thunk at multi.jl:584
 in run_work_thunk at multi.jl:593
 in anonymous at task.jl:8
   ...
Warning: requiring "PowerSeries" did not define a corresponding module.
rened commented 9 years ago

A git bisect with make cleanall leads me to 416a23ee as the first bad commit. 88bb2e9 is the last commit where

./julia -e "addprocs(2); @everywhere using FactCheck"

succeeds for me. (No pre-compilation used anywhere).

timholy commented 9 years ago

I also find that one to be strange as a culprit, but CCing @ScottPJones anyway.

ScottPJones commented 9 years ago

I'm not at home now, but I can't see how it would have an effect, unless there were invalid UTF-8 data that wasn't detected before, but you would have seen a UnicodeError then

rened commented 9 years ago

Interesting. What I posted above was performed on OSX. On Linux, both commits work just fine. Current master on Linux fails, though, as on OSX. If nobody beats me to it I will bisect again on both systems tomorrow.

vtjnash commented 9 years ago

i don't think a bisect is entirely required for this one: from the issue description above, it's apparent that there's a race condition between the call to using on node 1 and the call to using on the other nodes that is not being properly accounted for in the changes to the require logic.

ScottPJones commented 9 years ago

bisects are not at all accurate for things like race conditions - they can only give you some idea as to some bounds where the bug was introduced.

jlapeyre commented 9 years ago

Yes, it looks like a race condition. The type of error produced (and whether they occur at all) changes from run to run.

jlapeyre commented 9 years ago

For testing, I find the bug more likely to occur with four processes than two.

jlapeyre commented 9 years ago

If I do

make cleanall
make distclean
make -C deps distcleanall

Then building commit 88bb2e9 shows the bug.

rened commented 9 years ago

FWIW, sleeping a little makes it pass for me again on current master:

./julia -e "addprocs(6); @everywhere sleep(0.1); using JSON, FactCheck, Compat"

Calling it without @everywhere altogether works as well:

./julia -e "addprocs(6); using JSON, FactCheck, Compat"
carnaval commented 9 years ago

sleeping a little makes everything pass

amitmurthy commented 9 years ago

https://github.com/JuliaLang/julia/pull/12581 does seem to fix the segfault on my local machine. Request other folks to tests it out.

This is what I get.

amitm@amitm-macbookpro:~/Work/julia/julia$ julia -p 4
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6662 (2015-08-13 16:13 UTC)
 _/ |\__'_|_|_|\__'_|  |  amitm/loading_fix/296a7db* (fork: 1 commits, 2 days)
|__/                   |  x86_64-linux-gnu

julia> @everywhere using LightGraphs
INFO: Precompiling module LightGraphs...
WARNING: Module StatsFuns uuid did not match cache file
WARNING: Module StatsFuns uuid did not match cache file
WARNING: Module StatsFuns uuid did not match cache file
WARNING: Module StatsFuns uuid did not match cache file
WARNING: node state is inconsistent: node 2 failed to load cache from /home/amitm/.julia/lib/v0.4/LightGraphs.ji
WARNING: node state is inconsistent: node 3 failed to load cache from /home/amitm/.julia/lib/v0.4/LightGraphs.ji
WARNING: node state is inconsistent: node 4 failed to load cache from /home/amitm/.julia/lib/v0.4/LightGraphs.ji
WARNING: node state is inconsistent: node 5 failed to load cache from /home/amitm/.julia/lib/v0.4/LightGraphs.ji

and

julia> 
amitm@amitm-macbookpro:~/Work/julia/julia$ ./julia -e "addprocs(6); using JSON, FactCheck, Compat"
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module DataStructures should explicitly import < from Base
WARNING: module DataStructures should explicitly import <= from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base
WARNING: module JSON should explicitly import colon from Base

Warnings and errors but no segfault.

vtjnash commented 9 years ago

why are you getting "WARNING: node state is inconsistent" there? that generally is going to be really, really bad.

amitmurthy commented 9 years ago

I cleaned .cache.

Now with julia -p4, I see

julia> @everywhere using LightGraphs
WARNING: replacing module LightGraphs
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35.
WARNING: replacing module LightGraphs
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35.
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35.
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:32.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35 overwritten in module LightGraphs at /home/amitm/.julia/v0.4/LightGraphs/src/core.jl:35.

Why did it precompile the first time around and not now?

sbromberger commented 9 years ago

Now getting

seth@schroeder:~/dev/julia/julia$ julia -p 4
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6817 (2015-08-18 15:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 77bef6e (0 days old master)
|__/                   |  x86_64-apple-darwin14.5.0

julia> @everywhere using LightGraphs
WARNING: replacing module LightGraphs
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27.
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
WARNING: replacing module LightGraphs
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27.
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27.
WARNING: Method definition ==(Base.Pair{Int64, Int64}, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:24.
WARNING: Method definition show(Base.IO, Base.Pair{Int64, Int64}) in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27 overwritten in module LightGraphs at /Users/seth/.julia/v0.4/LightGraphs/src/core.jl:27.
wildart commented 9 years ago

I tried it on a fresh build made with make cleanall with a package that does not use precompilation:

➜  LMCLUS git:(master) ✗ julia-dev
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6817 (2015-08-18 15:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 77bef6e* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> nprocs()
2

julia> @everywhere using LMCLUS
WARNING: replacing module LMCLUS
WARNING: could not import MultivariateStats.PCA into LMCLUS
WARNING: could not import MultivariateStats.fit into LMCLUS
WARNING: could not import MultivariateStats.principalratio into LMCLUS

signal (11): Segmentation fault
unknown function (ip: 0x7f64226a509a)
unknown function (ip: 0x7f6422621ecb)
unknown function (ip: 0x7f6422621ef1)
jl_get_global at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 0x7f6422667344)
unknown function (ip: 0x7f6422668263)
unknown function (ip: 0x7f6422669081)
unknown function (ip: 0x7f64226694e6)
unknown function (ip: 0x7f642267cb0f)
unknown function (ip: 0x7f642267d3d9)
jl_load_file_string at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
include_string at loading.jl:228
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
include_from_node1 at ./loading.jl:269
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 0x7f6422668a43)
unknown function (ip: 0x7f6422667e61)
unknown function (ip: 0x7f642267c6e8)
unknown function (ip: 0x7f642267ce52)
unknown function (ip: 0x7f642267caa5)
unknown function (ip: 0x7f642267d3d9)
jl_load_file_string at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
include_string at loading.jl:228
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
include_from_node1 at ./loading.jl:269
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 0x7f6422668a43)
unknown function (ip: 0x7f6422667e61)
unknown function (ip: 0x7f642267c6e8)
jl_toplevel_eval_in at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
require at ./loading.jl:203
unknown function (ip: 0x7f641f42a53c)
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 0x7f642267b9c5)
unknown function (ip: 0x7f642267c95b)
jl_toplevel_eval_in at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
eval at ./sysimg.jl:14
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
anonymous at multi.jl:1348
jl_f_apply at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
anonymous at multi.jl:889
run_work_thunk at multi.jl:642
jlcall_run_work_thunk_21126 at  (unknown line)
jl_apply_generic at /home/art/Development/julia-nightly/usr/bin/../lib/libjulia.so (unknown line)
anonymous at task.jl:889
unknown function (ip: 0x7f642266e650)
unknown function (ip: (nil))
Worker 2 terminated.ERROR: ProcessExitedException()
 in yieldto at ./task.jl:75
 in wait at ./task.jl:371
 in wait at ./task.jl:286
 in wait at ./channels.jl:93
 in take! at ./channels.jl:82
 in take! at ./multi.jl:789
 in remotecall_fetch at multi.jl:726
 in remotecall_fetch at multi.jl:731
 in anonymous at multi.jl:1350
 in sync_end at ./task.jl:413
 in anonymous at multi.jl:1359

ERROR (unhandled task failure): EOFError: read end of file
 in sync_end at ./task.jl:413
 in anonymous at multi.jl:1359

But if I precompile the package in advance - no problems:

➜  LMCLUS git:(master) ✗ julia-dev
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6817 (2015-08-18 15:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 77bef6e* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> Base.compilecache(:LMCLUS)
"/home/art/.julia/lib/v0.4/LMCLUS.ji"

julia> 
➜  LMCLUS git:(master) ✗ julia-dev
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6817 (2015-08-18 15:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 77bef6e* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> @everywhere using LMCLUS
WARNING: replacing module LMCLUS

julia> 
rened commented 9 years ago

Just tried on OSX and Linux using latest master (Commit e5e8ed5*), and don't get any crashes when running (for X in 1:8):

julia -e "@everywhere using LightGraphs" -p X 

I do get the WARNING: node state is inconsistent though.

Using @everywhere using with an empty compilation cache triggers it. Before each of the following I run rm -rf ~/.julia/lib/v0.4. This works:

> julia -e "@everywhere using LightGraphs"

Adding one or more workers shows the warning:

> julia -e "@everywhere using LightGraphs" -p 1
WARNING: Module StatsFuns uuid did not match cache file
WARNING: node state is inconsistent: node 2 failed to load cache from /Users/rene/.julia/lib/v0.4/LightGraphs.ji

No warning when omitting @everywhere:

> julia -e "using LightGraphs" -p 1
rened commented 9 years ago

forgot to cc @stevengj

stevengj commented 9 years ago

I suspect that what is going on here is:

Possible solutions:

The third option seems best to me, since atomic rename is usually the best practice for writing files in general.

(Also, it would be nicer if the deserialization import threw an error on a corrupt .ji file rather than crashing, if indeed that is what is happening.)

wildart commented 9 years ago

Using import before @everywhere worked for me.

stevengj commented 9 years ago

I implemented the rename on general principles, but it doesn't seem to solve the WARNING: Module StatsFuns uuid did not match cache file.

Even when the cache file already exists, however, julia -e "@everywhere using LightGraphs" -p 1 gives WARNING: replacing module LightGraphs. There is a basic problem here because @everywhere using actually imports the module twice on all the workers:

If the latter happens before the former, I guess the module will get imported twice, leading to the problems we are seeing. (At best, you get a warning.)

It seems like the using logic really needs to know whether it is happening in an @everywhere or similar statement to avoid this.

rened commented 9 years ago

While it would be great to have @everywhere using work without showing any warnings, I believe using @everywhere using is simply a relic from when using did not yet auto-load on all workers. The warnings are due to the race condition of running import twice.

Shall we just live with this for now and simply discourage using @everywhere using? (The original issue was a segfault, which does no longer occur).

JeffBezanson commented 9 years ago

The package_locks mechanism was supposed to (used to?) solve this. If you try to do using X multiple times at once on a worker, it should actually happen only once.

JeffBezanson commented 9 years ago

@stevengj It makes sense that your change would fix the segfault. Is there evidence of some other problem as well, or are we done here?

stevengj commented 9 years ago

We haven't had any indication that it is still segfaulting. It would be good to have an issue for eliminating the warning, but probably that should be a separate issue.

ViralBShah commented 9 years ago

Should we then take this off the 0.4 milestone list?

vtjnash commented 9 years ago

i think there are a few improvements that can be made: 1) to reduce the window of inconsistency, find_all_in_cache_path should block if package_locks[mod] indicates that the node is in the process of calling compilecache for that module 2) to work harder to reduce this window for inconsistencies, __precompile__ should be handled on worker nodes by first attempting to convince node 1 to cachecompile the package (instead of ignoring this directive on worker nodes) before deciding whether to abort or continue running the source file 3) the broadcast of top-level import from node 1 should include a conditional check of isdefined(Main, mod) to block accidental redefinition (unless the user explicitly does @everywhere reload("Mod")) 4) to reduce potential confusion, rename require to reload and deprecate the old name entirely

oxinabox commented 8 years ago

cross-ref https://groups.google.com/forum/#!topic/julia-users/UXrv1YNbYqY

izarov commented 8 years ago

Is the following deserialization error on workers a manifestation of this race condition?

# higher number of workers relative to available cores seems to make it easier to reproduce
# e.g., try with 8 if 4x doesn’t work
workers = 4*Sys.CPU_CORES;
addprocs(workers);

@everywhere begin
  import Distributions

  immutable ParameterUnivariate{U<:Distributions.UnivariateDistribution}
    dist::U
  end
end

param = ParameterUnivariate(Distributions.Normal());
pmap(x->x, fill(param, 100));

results in reloading module warnings and then a large number of workers exiting with error:

ERROR: TypeError: ParameterUnivariate: in U, expected U<:Distributions.Distribution{Distributions.Univariate,S<:Distributions.ValueSupport}, got Type{Distributions.Normal}
 in deserialize_datatype at serialize.jl:646
...

Moving import Distributions outside of the @everywhere block as using Distributions seems to fix it. Reproducible on 0.4.5 and 0.5.

cstjean commented 8 years ago

We faced the same issue in DecisionTree.jl, and I've boiled it down to this. No precompilation necessary (on Julia 0.4, OSX)

# B.jl
module B
end
# C.jl
module C
abstract AbstractAbstract
end
# A.jl
module A
using B             # can be any module
include("incl.jl")  # problem disappears if the import is done in A.jl
end
# incl.jl
import C: AbstractAbstract
type Obj <: AbstractAbstract end

then interactively:

addprocs(3)
@everywhere using A
> On worker 2: UndefVarError: AbstractAbstract not defined
stevengj commented 8 years ago

import A; @everywhere using A is the best way to do this at the moment, I think.

pearcemc commented 8 years ago

I'm not sure this is a closed issue (I'm on 0.5.0). This was my workaround (I like reload for debugging purposes.):

for p in procs()
    @fetchfrom p reload("Package")
end
stevengj commented 8 years ago

@pearcemc, do import Package; @everywhere using Package.

stevengj commented 7 years ago

Fixed by #21718?