JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
29 stars 11 forks source link

need precompile statements re-enabled for `addprocs` (with PR) #71

Open non-Jedi opened 4 years ago

non-Jedi commented 4 years ago

As discovered in https://discourse.julialang.org/t/help-with-binary-trees-benchmark-games-example/37307/13

❯ hyperfine -w1 "julia -p4 -E 'using Distributed; nprocs()'" "julia -E 'using Distributed; addprocs(); nprocs()'"
Benchmark JuliaLang/julia#1: julia -p4 -E 'using Distributed; nprocs()'
  Time (mean ± σ):      2.040 s ±  0.010 s    [User: 5.563 s, System: 0.773 s]
  Range (min … max):    2.024 s …  2.054 s    10 runs

Benchmark JuliaLang/julia#2: julia -E 'using Distributed; addprocs(); nprocs()'
  Time (mean ± σ):      1.785 s ±  0.014 s    [User: 5.337 s, System: 0.756 s]
  Range (min … max):    1.765 s …  1.816 s    10 runs

Summary
  'julia -E 'using Distributed; addprocs(); nprocs()'' ran
    1.14 ± 0.01 times faster than 'julia -p4 -E 'using Distributed; nprocs()''

Is there a reason spawning the extra processes with addprocs() is necessarily faster than spawning them with -p command-line argument?

fredrikekre commented 4 years ago

Probably because theh addprocs version is already compiled; https://github.com/JuliaLang/julia/blob/0c284839fef6c8c153edc01fddfa37a9f5ac6752/contrib/generate_precompile.jl#L44-L45.

non-Jedi commented 4 years ago

@fredrikekre did you close because there's no way to get similar speed for -p4?

PallHaraldsson commented 4 years ago

It doesn't seem like this should have been closed. It should be as fast, and -p needed for it to be in the hands of the user, not programmer. See also: https://github.com/JuliaLang/julia/issues/35830#issuecomment-626825539

KristofferC commented 4 years ago

Should that issue be closed and this one opened then?

PallHaraldsson commented 4 years ago

No, keep both open. Mine is not a dup (about scalability), while slightly different, the cause may or may not be the same.

First, I saw no difference, for this issue, on Julia 1.0 using defaults, nor on most recent ASSUMING these settings only:

$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia  --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'
  Time (mean ± σ):      1.320 s ±  0.011 s    [User: 3.226 s, System: 2.114 s]
  Range (min … max):    1.304 s …  1.333 s    10 runs

$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''
  Time (mean ± σ):      1.323 s ±  0.008 s    [User: 3.259 s, System: 2.020 s]
  Range (min … max):    1.309 s …  1.335 s    10 runs

For default settings, there is a difference, and even with -O0 min..max ranges do not overlap, so as I've seen that setting eliminate invalidations, I would say those are implicated?

vtjnash commented 7 months ago

Now performance is switched, so problem solved!

vtjnash@deepsea4:~/julia$ hyperfine -w1 "./julia -p4 -E 'using Distributed; nprocs()'" "./julia -E 'using Distributed; addprocs(); nprocs()'"
Benchmark 1: ./julia -p4 -E 'using Distributed; nprocs()'
  Time (mean ± σ):      8.952 s ±  1.129 s    [User: 26.344 s, System: 0.740 s]
  Range (min … max):    8.058 s … 10.398 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (10.222 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: ./julia -E 'using Distributed; addprocs(); nprocs()'
  Time (mean ± σ):     14.585 s ±  0.315 s    [User: 62.846 s, System: 2.424 s]
  Range (min … max):   14.057 s … 14.948 s    10 runs

Summary
  './julia -p4 -E 'using Distributed; nprocs()'' ran
    1.63 ± 0.21 times faster than './julia -E 'using Distributed; addprocs(); nprocs()''

Clearly needs more precompile statements, now that Distributed is a separate stdlib that is much more reasonable then when it was included in the default image.

vtjnash commented 7 months ago

Code at https://github.com/JuliaLang/julia/pull/42156

ViralBShah commented 7 months ago

@KristofferC Should we go ahead and enable precompile?