JuliaLang / JuliaSyntax.jl

The Julia compiler frontend
Other
274 stars 33 forks source link

Large invalidations from uninferred `AbstractString` #252

Closed jakobnissen closed 1 year ago

jakobnissen commented 1 year ago

I've noticed that recently, I've been getting multi-second latency during normal Julia work simply from loading packages. I've tracked the issue to OhMyREPL which I have in my startup.jl, and which in turn loads JuliaSyntax.jl. The issue ultimately stems from a lot of JuliaSyntax code being poorly inferred, with one or more arguments being AbstractString. When JSON3 then adds a new method Base.:(==)(::JSON3.PointerString, ::AbstractString), world-splitting leads to large-scale invalidations in JuliaSyntax.

Here is a small reproducer: Run it in an environment with JuliaSyntax, SnoopCompile, SnoopCompileCore and JSON3 installed:

julia> using SnoopCompileCore

julia> inv = @snoopr using JuliaSyntax, JSON3;

julia> using SnoopCompile

julia> tree = last(invalidation_trees(inv))
inserting ==(y::JSON3.PointerString, x::String) @ JSON3 ~/.julia/packages/JSON3/CpNms/src/strings.jl:15 invalidated:
   backedges: 1: superseding ==(a::AbstractString, b::AbstractString) @ Base strings/basic.jl:325 with MethodInstance for ==(::AbstractString, ::String) (245 children)
   35 mt_cache

You see that 245 methods got invalidated leading to quite noticable latency. I'm not very experienced with fixing invalidations, but maybe you can use this MWE as a starting point. I suspect that for JuliaSyntax, there is absolutely no need to be generic over AbstractString, and the code can be made type stable with String or SubString{String} or whatever.

KristofferC commented 1 year ago

Ah, that's what is causing the delay from OhMyREPL. I was blaming the fact that I was overwriting stuff but the switch to JuliaSyntax makes sense since I didn't really see it happening before that.

c42f commented 1 year ago

If we ran OhMyREPL in a fixed world age with invoke_in_world we should be able to fix any such issues in the future, and also prevent anyone from breaking the OhMyREPL tooling by interactively pirating stuff from Base.

This is how JuliaSyntax.enable_in_core() works.

Though also having the fix in #254 seems like it should just make things more efficient and is generally a good idea (at a guess?)

jakobnissen commented 1 year ago

Great to see it was so relatively easily fixable! On the other hand, more broadly, I always get a bit discouraged when hunting invalidations. I usually see some reasonable type instability (e.g. ErrorException containing AbstractString, or default stdout being inferred to IO), which is then optimised using world-splitting. Then someone comes along and adds one more method and accidentally invalidates 200 methods.

Somehow, it seems unreasonable to expect all code everywhere to be 100% inferred, especially parsing code and code related to IO.

What I'm getting at is: Do we have a reason to believe that some package other than JSON3 is not going to extend some other string method and re-invalidate those 250 methods in JuliaSyntax? Or does @KristofferC 's PR actually make JuliaSyntax type stable?

KristofferC commented 1 year ago

Or does @KristofferC 's PR actually make JuliaSyntax type stable?

It makes it more type stable at least since the inference failure from the non-specialization is fixed.

pfitzseb commented 1 year ago

Out of curiosity: How'd you end up finding this specific inference failure?

KristofferC commented 1 year ago

Standard SnoopCompile @snoopr + ascend.

(Edit: Not sure if you were asking me)

jakobnissen commented 1 year ago

@pfitzseb I noticed that I experienced latency whenever I loaded JSON3, so I did something like the following:

using SnoopCompile
inv = @snoopr using JSON3
trees = invalidation_trees(inv)
ascend(last(last(trees).backedges))

And then I saw tonnes and tonnes of JuliaSyntax methods being invalidated.

timholy commented 1 year ago

If we ran OhMyREPL in a fixed world age with invoke_in_world we should be able to fix any such issues in the future, and also prevent anyone from breaking the OhMyREPL tooling by interactively pirating stuff from Base.

I have the impression that Jameson doesn't think that actually works. Is there a clear summary of the issues somewhere?

c42f commented 1 year ago

I have the impression that Jameson doesn't think that actually works

I use it all the time and it works fine.

There are some some potential caveats, for example you need to capture the world age at runtime init (not precompile time as you'd like, because precompile world age represents a parallel universe). So if you've

there's a chance for package B to cause invalidations in the fixed world you end up using for A.

However for utility packages like OhMyREPL it's still likely a big improvement to freeze the world at the start of the Julia session. Then you only need to fight against invalidations caused by a small number of utility packages loaded before OhMyREPL is initialized. In comparison to the whole of General that's pretty good.

KristofferC commented 1 year ago

So the pattern is (in pseudo code)

frozen_world::Union{Nothing, UInt} = nothing

function run_thingy()
    if frozen_world == nothing
        frozen_world = current_world()
    end
    invoke_in_world(_run_thingy, frozen_world)
end

function _run_thingy()
    ....
end

?

c42f commented 1 year ago

Sure, that's the general idea :+1: Here's how JuliaSyntax does it:

https://github.com/JuliaLang/JuliaSyntax.jl/blob/694d7c222630b560956918d1fb704c605d904379/src/hooks.jl#L100-L113

https://github.com/JuliaLang/JuliaSyntax.jl/blob/694d7c222630b560956918d1fb704c605d904379/src/hooks.jl#L257-L271