JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.75k stars 5.49k forks source link

Compilation time/memory regression in recent Julia versions #38723

Closed aplavin closed 1 year ago

aplavin commented 3 years ago

Hi! I noticed that a certain piece of code compiles ~2 times slower and allocates ~twice as much memory in nightly, compared to 1.5. As the package versions are exactly the same, this could be related to some changes in julia itself.

The MWE:

using InteractiveUtils
versioninfo()
import Pkg
Pkg.activate(".")
Pkg.status()

import JSON3
import StructTypes
str = """{"a": 123}"""
mutable struct Entity
    a::Int
    Entity() = new()
end
StructTypes.StructType(::Type{Entity}) = StructTypes.Mutable()
@time JSON3.read(str, Entity)

Running on 1.5:

 Activating environment at `~/tmp/Project.toml`
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Status `~/tmp/Project.toml`
  [0f8b85d8] JSON3 v1.5.1
  [856f2bd8] StructTypes v1.1.0
  2.619922 seconds (5.61 M allocations: 220.972 MiB, 2.34% gc time)

On 1.6:

  Activating environment at `~/tmp/Project.toml`
Julia Version 1.6.0-DEV.1673
Commit 65382c708d* (2020-12-05 06:28 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, skylake)
Status `~/tmp/Project.toml`
  [0f8b85d8] JSON3 v1.5.1
  [856f2bd8] StructTypes v1.1.0
  4.596984 seconds (10.65 M allocations: 466.155 MiB, 6.86% gc time, 100.00% compilation time)

Last line in each block above is the @time output. Please let me know if more details are needed on this.

Also, just now @oscardssmith confirmed this regression: https://julialang.slack.com/archives/C67910KEH/p1607192872186600.

timholy commented 3 years ago

On my machine, SnoopCompile says

julia> tinf = @snoopi JSON3.read(str, Entity)
1-element Array{Tuple{Float64,Core.MethodInstance},1}:
 (0.3846621513366699, MethodInstance for read(::String, ::Type{Entity}))

for Julia 1.5 and

julia> tinf = @snoopi JSON3.read(str, Entity)
2-element Vector{Tuple{Float64, Core.MethodInstance}}:
 (3.0994415283203125e-5, MethodInstance for serializationname(::Tuple{}, ::Int64))
 (1.0131659507751465, MethodInstance for read(::String, ::Type{Entity}))

So inference is at least part of it. The unreleased @snoopi_deep indicates that these are the MethodInstances that individually take at least 10ms to infer:

 0.010608098 => InferenceFrameInfo for Parsers._parsedigits(::Type{Float64}, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Int16, ::Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}, ::UInt128, ::Bool, ::Int64)
 0.011437678 => InferenceFrameInfo for Parsers.scale(::Type{Float64}, ::Any, ::Int64, ::Bool)
  0.01325289 => InferenceFrameInfo for Parsers.scale(::Type{Float64}, ::UInt64, -1::Int64, ::Bool)
 0.013296493 => InferenceFrameInfo for StructTypes.construct(::JSON3.TupleClosure{Base.CodeUnits{UInt8, String}, NamedTuple{(), Tuple{}}}, ::Type{T} where T<:Tuple)
 0.013568487 => InferenceFrameInfo for Parsers._parseexp(::Type{Float64}, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Int16, ::Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}, ::UInt64, ::Bool, ::Int64, ::UInt64, ::UInt128, ::Bool)
 0.018401788 => InferenceFrameInfo for Parsers.typeparser(::Type{Float64}, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, 0::Int16, Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}([""], nothing, 0x20, 0x09, 0x22, 0x22, 0x22, nothing, 0x2e, nothing, nothing, nothing, nothing, false, false)::Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing})
  0.01970693 => InferenceFrameInfo for JSON3.var"#read#21"(false::Bool, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, read::typeof(JSON3.read), StructTypes.Struct()::StructTypes.Struct, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Any})
 0.020570963 => InferenceFrameInfo for Parsers.parsefrac(::Type{Float64}, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, 0::Int16, Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}([""], nothing, 0x20, 0x09, 0x22, 0x22, 0x22, nothing, 0x2e, nothing, nothing, nothing, nothing, false, false)::Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}, ::UInt64, ::Bool, ::Int64, 0::UInt64)
 0.022449375 => InferenceFrameInfo for Parsers.parsedigits(::Type{T}, ::Base.CodeUnits, ::Any, ::Int64, ::Any, ::Int16, ::Parsers.Options{ignorerepeated, ignoreemptylines, Q, debug, S, D, DF}, ::BigInt, ::Bool, ::Any) where {T<:Union{Float16, Float32, Float64, BigFloat}, ignorerepeated, ignoreemptylines, Q, debug, S, D, DF}
 0.024710397 => InferenceFrameInfo for Parsers.scale(::Type{Float64}, ::Any, -1::Int64, ::Bool)
 0.025300999 => InferenceFrameInfo for JSON3.read(StructTypes.Mutable()::StructTypes.Mutable, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Entity})
 0.025741006 => InferenceFrameInfo for (::JSON3.TupleClosure{Base.CodeUnits{UInt8, String}, _A} where _A)(23::Int64, 23::Int64, ::Type{var"#s432"} where var"#s432")
 0.025834635 => InferenceFrameInfo for JSON3.read!(StructTypes.Mutable()::StructTypes.Mutable, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Entity}, ::Entity)
 0.029153585 => InferenceFrameInfo for Parsers.parsedigits(::Type{Float64}, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, 0::Int16, Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}([""], nothing, 0x20, 0x09, 0x22, 0x22, 0x22, nothing, 0x2e, nothing, nothing, nothing, nothing, false, false)::Parsers.Options{false, false, false, false, Nothing, Nothing, Nothing}, 0::UInt64, ::Bool, ::Int64)
 0.038856846 => InferenceFrameInfo for JSON3.var"#read#16"(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, read::typeof(JSON3.read), ::String, ::Type{Entity})
 0.039244615 => InferenceFrameInfo for JSON3.var"#read#37"(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, read::typeof(JSON3.read), StructTypes.Mutable()::StructTypes.Mutable, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Entity})
 0.043654891 => InferenceFrameInfo for JSON3.var"#read!#39"(Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, read!::typeof(JSON3.read!), StructTypes.Mutable()::StructTypes.Mutable, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Entity}, ::Entity)
 0.066151068 => InferenceFrameInfo for JSON3.read(StructTypes.Struct()::StructTypes.Struct, ::Base.CodeUnits{UInt8, String}, ::Int64, ::Int64, ::UInt8, ::Type{Any})

There's nothing that really jumps out about the flamegraph:

image

Starting at typeparser it gets into the Parsers package; presumably that's used on 1.5 as well?

aplavin commented 3 years ago

presumably that's used on 1.5 as well?

Sorry, if that question is to me then I don't really understand how to check whether this is the case. I assume that the same version of the JSON3 package calls the same functions from other packages no matter julia version.

quinnj commented 3 years ago

The Parsers.jl package float parsing algorithm was recently refactored from a single large Parsers._typeparser method to the split up Parsers.parsedigits, Parsers.parseexp, Parsers.parsefrac, etc. Perhaps those could use some precompile statements that would help? Happy to help rearrange things however would help in Parsers.jl/JSON3.jl

aplavin commented 3 years ago

Tested my example again right now on rc1, and (if I made no mistake) the improvement wrt 1.5 is about a factor of three: @time shows 0.9 seconds for me! This is about five times faster than 1.6 nightly when I started this thread.

vtjnash commented 3 years ago

I think we've addressed this, incidentally:

1.4  5.849163 seconds (7.04 M allocations: 274.220 MiB, 2.22% gc time)
1.5  2.075141 seconds (5.32 M allocations: 214.742 MiB, 8.51% gc time)
1.6  1.168197 seconds (6.95 M allocations: 340.663 MiB, 9.41% gc time, 99.99% compilation time)
1.7  1.188911 seconds (6.87 M allocations: 345.271 MiB, 9.12% gc time, 100.00% compilation time)
aplavin commented 3 years ago

I haven't timed the example again since this discussion until now, but looks like the regression returned...

a) Current 1.6 is slower than 1.5 was:

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
  Activating environment at `/tmp/tmp.kJEE3lqcuJ/Project.toml`
      Status `/tmp/tmp.kJEE3lqcuJ/Project.toml`
  [0f8b85d8] JSON3 v1.9.2
  [856f2bd8] StructTypes v1.8.1
  3.249628 seconds (12.56 M allocations: 540.672 MiB, 8.19% gc time, 100.00% compilation time)

On the same computer as timings in my original post.

b) A recent nightly is even worse:

Julia Version 1.8.0-DEV.836
Commit c054dbc6d4 (2021-10-28 16:31 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
  Activating project at `/tmp/tmp.kJEE3lqcuJ`
Status `/tmp/tmp.kJEE3lqcuJ/Project.toml`
  [0f8b85d8] JSON3 v1.9.2
  [856f2bd8] StructTypes v1.8.1
  5.562231 seconds (12.75 M allocations: 528.901 MiB, 5.05% gc time, 100.00% compilation time)
julia-nightly script.jl  8.32s user 0.91s system 110% cpu 8.338 total
aplavin commented 2 years ago

I've used a new benchmarking package, https://github.com/jkrumbiegel/VersionBenchmarks.jl, and compared the timings across a range of Julia and JSON3 versions. Basically the whole time is taken by the JSON3.read(str, Entity) line, imports are almost instantaneous in comparison. Here is a plot of time vs julia version: image The time of the first run becomes slower and slower with each new julia version. I don't think that's to be expected?

I'll also report to JSON3.jl authors, because newer versions of that package are slower. But that is independent of the differences between julia versions.

aplavin commented 2 years ago

Bump...

aplavin commented 2 years ago

Bump...

timholy commented 2 years ago

I don't see anything close to the regression you see (I get 1.3s on julia-1.6 vs 1.7s on 1.8), but it is real. That said, it also may work out oppositely in practice. You can create a package EntityPkg with contents

module EntityPkg

import JSON3
import StructTypes
mutable struct Entity
    a::Int
    Entity() = new()
end
StructTypes.StructType(::Type{Entity}) = StructTypes.Mutable()

read_entity(str) = JSON3.read(str, Entity)

if ccall(:jl_generating_output, Cint, ()) == 1
    # we're precompiling, exercise our code
    # note it has to link inferrably back to methods defined in
    # this package, which is why we defined `read_entity`
    read_entity("""{"a": 123}""")
end

end

and then the benchmark

import EntityPkg
str = """{"a": 123}"""
tstart = time(); EntityPkg.read_entity(str); tcall = time() - tstart;
println(tcall)

is faster in 1.8 than in 1.6 (1.3s in julia-1.6 vs 0.8s in julia-1.8). This comes from Julia 1.8's much more comprehensive precompilation. And WIP will let us cache the native code too, which would eliminate the time. I'm tempted to close this. However, it may be worth others checking whether inference and/or codegen has gotten slower in the non-precompiled case.

EDIT: this was with JSON3 v1.9.5.

Another tip: don't use @time for benchmarking compile time as @time has been inconsistent across Julia versions about whether it first compiles the code (until it hits runtime dispatches) before starting the timer.

tstart = time(); JSON3.read(str, Entity); tcall = time() - tstart;

is much safer.

aplavin commented 2 years ago

Most recent JSON3 versions indeed got faster, but the difference is still there. With this completely self-contained script:

```julia using Pkg redirect_stderr(open("/tmp/jl123", "w")) do Pkg.activate(temp=true) pkg"add JSON3@1.9.5 StructTypes" end import JSON3 import StructTypes str = """{"a": 123}""" mutable struct Entity a::Int Entity() = new() end StructTypes.StructType(::Type{Entity}) = StructTypes.Mutable() tstart = time(); JSON3.read(str, Entity); tcall = time() - tstart; @show VERSION tcall ```

I see a slowdown from 1 to 2.4 seconds between Julia 1.5 and 1.8/1.9:

``` VERSION = v"1.5.4" tcall = 0.9965651035308838 VERSION = v"1.6.6" tcall = 1.69846510887146 VERSION = v"1.7.3" tcall = 1.6435589790344238 VERSION = v"1.8.0-rc1" tcall = 2.418134927749634 VERSION = v"1.9.0-DEV.852" tcall = 2.378304958343506 ```

With a slightly older JSON3@1.9.0 (that I likely used when benchmarking the last time, on a different laptop), the script shows a slowdown from 2.5 to 4.6 seconds for the same Julia versions.

So, the ~2x regression is still there, for both JSON3 versions.

Precompilation is great, sure, and I feel its improvements over the last few years even without measuring anything. Still, more ad-hoc workflows are also important. Even with packages, efficient precompilation tends to require certain tricks, as your example illustrates.

timholy commented 2 years ago

Even with packages, efficient precompilation tends to require certain tricks, as your example illustrates.

True, but we have some tooling (and can have more) that allows a lot of this to be added quite easily.

I looked at profiles comparing 1.6 and 1.8 (I don't have 1.5 locally) and nothing instantly jumped out---it's like everything got just a little bit slower across-the-board. But that's just a high-level view, and at the level of details something might jump out. To make that analysis easier, we could use some tools to compare two Profile.print(format=:flat, C=true) outputs that match calls using fuzzy string matching. I don't think we have that now and I am not going to interrupt my ongoing work on yet more improvements to precompilation to write them. So I'm going to leave this analysis for others to continue.

vtjnash commented 1 year ago

I don't see any regressions. Looks quite fast with JSON3 master now:

julia> @time @eval JSON3.read(str, Entity)
  1.609497 seconds (1.47 M allocations: 58.016 MiB, 1.92% gc time, 99.72% compilation time) # v1.8
  1.152540 seconds (1.80 M allocations: 80.174 MiB, 4.05% gc time, 99.52% compilation time) # v1.9
  1.162008 seconds (1.83 M allocations: 79.574 MiB, 99.98% compilation time) # v1.10
  1.312811 seconds (1.83 M allocations: 80.576 MiB, 2.78% gc time, 99.98% compilation time) # v1.11
aplavin commented 1 year ago

I still see a similar regression as before. Taking my script above with JSON3#main, I get:

VERSION = v"1.5.4"
tcall = 0.7852849960327148

VERSION = v"1.9.2"
tcall = 1.3266339302062988

VERSION = v"1.10.0-beta2"
tcall = 1.6346149444580078

So about a factor of 2 from Julia 1.5 to now.