JuliaInterop / JavaCall.jl

Call Java from Julia
http://juliainterop.github.io/JavaCall.jl
Other
118 stars 53 forks source link

Faster startup #121

Closed PallHaraldsson closed 4 years ago

PallHaraldsson commented 4 years ago

Jeff added this new trick to Plots.jl already, helping make startup/first plot faster for Julia 1.5 (there MIGHT be a slowdown), so I copied that code. It's also been added to PyCall.jl so I've suggested elsewhere, where you see the reasoning:

https://github.com/JuliaInterop/RCall.jl/pull/383

I didn't test here, and see you have JNI module where this may also apply for.

mkitti commented 4 years ago

Hi @PallHaraldsson , thank you for suggesting this. I'm looking into it.

mkitti commented 4 years ago

Reference: https://github.com/JuliaLang/julia/pull/34896

mkitti commented 4 years ago

I think this need to be benchmarked carefully. Since this is an experimental feature, I think we should merge but only enable this based on an environmental variable. In other words, it should not be active by default.

PallHaraldsson commented 4 years ago

Others are not that cautious (with environment variable), i.e. PyCall.jl. I'm not sure I would read too much into the "experimental". The assumption in PyCall is that you call code that does non-small amount of work. If you have extremely small amount of work being called, say in a loop, then it might be a problem, and I would look into such.

I would just document this in NEWS, if you're worried. People could always revert to Julia 1.4 (and/or older version of JavaCall) if a problem. And if we hear of a problem the PR could be reverted.

If you're really worried, just skip this PR. Maybe the startup isn't too bad (the only thing this helps, while for all packages adds up):

https://discourse.julialang.org/t/rethinking-optimization-lower-ok-for-all-julia-code-for-e-g-faster-startup-as-a-default/39843/3?u=palli

mkitti commented 4 years ago

My current analysis of the practical usage of this is that it saves a second during precompilation. After precompilation there is not much difference.

PallHaraldsson commented 4 years ago

This is after precompilation, I get 0.73 sec. faster with -O1 and this PR should get (all? of) that benefit:

$ time ~/julia-1.5.0-beta1/bin/julia -O1
julia> @time using PyCall
  1.514064 seconds (3.21 M allocations: 162.845 MiB, 2.80% gc time)
(thereof) 0.306475 sec for DataStuctures, but I see it as fast on defaults, and others have minimal overhead.

vs. on defaults:
julia> @time using PyCall
  2.248437 seconds (3.21 M allocations: 162.842 MiB, 1.86% gc time)

Are you sure you exited to see a difference, e.g. first no-precompiling, counts as second, not first:

The first time around, on default settings:

julia> @time using PyCall
[ Info: Precompiling PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]
 16.954466 seconds (4.13 M allocations: 208.008 MiB, 0.35% gc time)

julia> @time using PyCall
  0.655401 seconds (1.11 M allocations: 54.271 MiB, 3.70% gc time)

julia> @time using PyCall
  0.000385 seconds (527 allocations: 31.094 KiB)

julia> @time using PyCall
  0.000445 seconds (527 allocations: 31.094 KiB)

Strangely for the above first few times, I didn't get a warning, what I now see on all settings:

julia> @time using JavaCall
┌ Warning: JavaCall needs the environment variable `JULIA_COPY_STACKS` to be `1` or `yes`.
│ Calling the JVM may result in undefined behavior.
└ @ JavaCall ~/.julia/packages/JavaCall/JPyF5/src/JavaCall.jl:42
[ Info: JavaCall could not determine javapath from `which java`
  2.226330 seconds (5.62 M allocations: 288.481 MiB, 8.93% gc time)

And FYI (not about this PR), you can get much faster with startup with (while there may be a slowdown after, or for some code not):
~/fenics$ ~/julia-1.5.0-beta1/bin/julia -O1 --compile=min
  0.861893 seconds (840.65 k allocations: 53.598 MiB)

julia> @time using JavaCall
┌ Warning: JavaCall needs the environment variable `JULIA_COPY_STACKS` to be `1` or `yes`.
│ Calling the JVM may result in undefined behavior.
└ @ JavaCall ~/.julia/packages/JavaCall/JPyF5/src/JavaCall.jl:42
[ Info: JavaCall could not determine javapath from `which java`
  0.511387 seconds (336.25 k allocations: 19.220 MiB)
mkitti commented 4 years ago

My current testing procedure is to evaluate the timing of running the tests (runtests.jl). My main priority here is the execution time of the code after the initial precompilation. If with lower optimization we can approximately keep or decrease that time after precompilation, then the second priority to see the effect of decreasing precompilation time itself.

My current results concur with setting O1. I am going to work on documenting that perhaps as a permanent tracking branch.

I've used the following preamble to assist with the testing.

module JavaCall

@static if get(ENV, "JAVACALL_PRECOMPILE","") == "false"
    __precompile__(false)
end

# Per-module optimization level
# https://github.com/JuliaLang/julia/pull/34896
# https://discourse.julialang.org/t/rethinking-optimization-lower-ok-for-all-julia-code-for-e-g-faster-startup-as-a-default/39843
#
# Use Base.compilecache(Base.PkgId(JavaCall)) if changed
@static if get(ENV, "JAVACALL_OPTLEVEL", "") ∈ ("0","1","2","3") &&
    isdefined(Base, :Experimental) &&
    isdefined(Base.Experimental, Symbol("@optlevel"))
    const optlevel = parse(Int,get(ENV, "JAVACALL_OPTLEVEL", ""))
else
    const optlevel = -1
end

if optlevel > -1
    @eval Base.Experimental.@optlevel($optlevel)
    @info "JavaCall optlevel set to $optlevel."
end

@info "JavaCall loaded"

...

end
PallHaraldsson commented 4 years ago

Ok, I misunderstood you then. If, as you tested for actual code, after using, it's as fast, I would just go with -O1 as a default.

I understand you being careful, and as I've proposed having the selective capability of module optimization in the hands of the user of the module, I (partially) support what you are doing.

You could do an major version update, to v0.8, to signal to users a change and/or document in NEWS. I would then do it carefully, as people may consider higher opt faster, e.g. 3, while likely 0 (or 1) is always fastest.

I see Plots.jl merged this in minor version 1.04. https://github.com/JuliaPlots/Plots.jl/blob/v1.0.4/src/Plots.jl not mentioning in NEWS. PyCall actually just merged the change silently too, but it's not yet in any tagged version, only on master. I would consider doing so, and people (who use master) could see what's it like.

I don't see any reason for the precompile logic, it's irrelevant to this, and would only slow down, shouldn't be in the hands of users. There's no reason do add it since it works without.

mkitti commented 4 years ago

To be clear about the above code I just posted, it is for diagnostics only. It is not meant to be merged into the master branch or a release. I plan to keep that code on a parallel branch to measure performance with and without precompilation at each of the opt levels.

mkitti commented 4 years ago

@optlevel 1 seems to be optimal.

Optlevel | Test Time     | Testing URL
0        | 21.68 seconds | https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695875724
1        | 20.22 seconds | https://travis-ci.org/github/JuliaInterop/JavaCall.jl/jobs/695874266
2        | 20.86 seconds | https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695874660
3        | 21.18 seconds | https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695875010
  1. https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695875724
  2. https://travis-ci.org/github/JuliaInterop/JavaCall.jl/jobs/695874266
  3. https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695874660
  4. https://travis-ci.org/github/mkitti/JavaCall.jl/jobs/695875010