Open non-Jedi opened 4 years ago
Probably because theh addprocs
version is already compiled; https://github.com/JuliaLang/julia/blob/0c284839fef6c8c153edc01fddfa37a9f5ac6752/contrib/generate_precompile.jl#L44-L45.
@fredrikekre did you close because there's no way to get similar speed for -p4
?
It doesn't seem like this should have been closed. It should be as fast, and -p needed for it to be in the hands of the user, not programmer. See also: https://github.com/JuliaLang/julia/issues/35830#issuecomment-626825539
Should that issue be closed and this one opened then?
No, keep both open. Mine is not a dup (about scalability), while slightly different, the cause may or may not be the same.
First, I saw no difference, for this issue, on Julia 1.0 using defaults, nor on most recent ASSUMING these settings only:
$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia --compile=min -O0 --startup-file=no -E 'using Distributed; addprocs(4);'
Time (mean ± σ): 1.320 s ± 0.011 s [User: 3.226 s, System: 2.114 s]
Range (min … max): 1.304 s … 1.333 s 10 runs
$ hyperfine -w1 "~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''"
Benchmark JuliaLang/julia#1: ~/julia-1.6.0-DEV-8f512f3f6d/bin/julia -p4 --compile=min --startup-file=no -O0 -E ''
Time (mean ± σ): 1.323 s ± 0.008 s [User: 3.259 s, System: 2.020 s]
Range (min … max): 1.309 s … 1.335 s 10 runs
For default settings, there is a difference, and even with -O0
min..max ranges do not overlap, so as I've seen that setting eliminate invalidations, I would say those are implicated?
Now performance is switched, so problem solved!
vtjnash@deepsea4:~/julia$ hyperfine -w1 "./julia -p4 -E 'using Distributed; nprocs()'" "./julia -E 'using Distributed; addprocs(); nprocs()'"
Benchmark 1: ./julia -p4 -E 'using Distributed; nprocs()'
Time (mean ± σ): 8.952 s ± 1.129 s [User: 26.344 s, System: 0.740 s]
Range (min … max): 8.058 s … 10.398 s 10 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (10.222 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Benchmark 2: ./julia -E 'using Distributed; addprocs(); nprocs()'
Time (mean ± σ): 14.585 s ± 0.315 s [User: 62.846 s, System: 2.424 s]
Range (min … max): 14.057 s … 14.948 s 10 runs
Summary
'./julia -p4 -E 'using Distributed; nprocs()'' ran
1.63 ± 0.21 times faster than './julia -E 'using Distributed; addprocs(); nprocs()''
Clearly needs more precompile
statements, now that Distributed
is a separate stdlib that is much more reasonable then when it was included in the default image.
@KristofferC Should we go ahead and enable precompile?
As discovered in https://discourse.julialang.org/t/help-with-binary-trees-benchmark-games-example/37307/13
Is there a reason spawning the extra processes with
addprocs()
is necessarily faster than spawning them with-p
command-line argument?