Since 0.5.17 OhMyREPL is not that snappy (without sysimage compilation)

KristofferC / OhMyREPL.jl

Syntax highlighting and other enhancements for the Julia REPL

https://kristofferc.github.io/OhMyREPL.jl/latest/

Other

768 stars 60 forks source link

Since 0.5.17 OhMyREPL is not that snappy (without sysimage compilation) #312

Open roflmaostc opened 1 year ago

roflmaostc commented 1 year ago

I pressed ] in both scenarios immediately:

With 0.5.20 (master): https://user-images.githubusercontent.com/19436526/226926409-63afeeb3-828c-4810-b256-07cf37d76917.mp4

With 0.5.17 . https://user-images.githubusercontent.com/19436526/226926422-a9f7c70e-3201-4ddf-b935-30352011a1de.mp4

The newer version takes considerably longer to enter the packager manager. I'm aware of a little lag but now it increased to a big lag.

With Julia 1.9 it seems to be snappy with 0.5.19 too.

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 12 on 24 virtual cores
Environment:
  JULIA_NUM_THREADS = 12

KristofferC commented 1 year ago

S it seems the issue is that JuliaSyntax has a lot higher latency than Tokenize on 1.8:

1.8:

julia> @time @eval collect(JuliaSyntax.Tokenize.Lexer("1+1 == 2"));
  1.036372 seconds (158.03 k allocations: 6.791 MiB, 99.68% compilation time)

julia> @time @eval collect(Tokenize.Lexers.Lexer("1+1 == 2"));
  0.002774 seconds (159 allocations: 8.602 KiB, 89.10% compilation time)

1.9:

julia> @time @eval collect(JuliaSyntax.Tokenize.Lexer("1+1 == 2"))
  0.016410 seconds (29.63 k allocations: 1.929 MiB, 76.52% compilation time)

julia> @time @eval collect(Tokenize.Lexers.Lexer("1+1 == 2"))
  0.011211 seconds (17.42 k allocations: 1.157 MiB, 98.11% compilation time)

@timholy, @c42f, I guess none of you have any direct ideas for being able to bring that down on 1.8. I am surprised how Tokenize manages to be so quick even on 1.8...

c42f commented 1 year ago

I can reproduce the 1.8 vs 1.9 differences.

Is there a difference between executing a precompile workload like we do in JuliaSyntax, vs the list of precompile statements in Tokenize src/_precompile.jl ? (Presuming equivalent coverage from both)

KristofferC commented 1 year ago

I did try to add an explicit precompile statement to JuliaSyntax as well but I didn't get it to "stick". If I executed that precompile statement in the REPL after loading JuliaSyntax, the first call was fast but not when only put into the package.

timholy commented 1 year ago

Is there a difference between executing a precompile workload like we do in JuliaSyntax, vs the list of precompile statements in Tokenize src/_precompile.jl ? (Presuming equivalent coverage from both)

Only if there are runtime dispatches. When you execute f(args...) and it has never been inferred, it infers f for those argtypes along with all inferrable callees. But if inference fails, it might terminate the inferred-callgraph and insert one or more runtime dispatches.

What happens next depends on "ownership": if the dispatched callee belongs to the package, no worries, it will also get cached. If it doesn't, then by default it won't. Explicit precompile directives mark their callees as "cache me even if you don't own me" so in that case there can be differences.

If I executed that precompile statement in the REPL after loading JuliaSyntax, the first call was fast but not when only put into the package.

Sounds like codegen, since precompile also triggers codegen but on 1.8 we don't save the codegen results.

Presumably JuliaSyntax is "bigger"? Or is that not really true?

Moelf commented 1 year ago

With Julia 1.9 it seems to be snappy with 0.5.19 too.

isn't that just because 1.9 added the ability to cache foreign package?

Now that we switched to JuliaSyntax, in 1.8, we can't cache JuliaSyntax stuff at all

c42f commented 1 year ago

Presumably JuliaSyntax is "bigger"? Or is that not really true?

It's definitely bigger - it includes all the Tokenize.jl code (heavily modified at this point), as well as the parser. And OhMyREPL now implicitly uses the JuliaSyntax parser, not just the tokenizer, as of https://github.com/KristofferC/OhMyREPL.jl/pull/306

But that in itself might not be the problem; I find it curious that collect(JuliaSyntax.Tokenize.Lexer("1+1 == 2")) is slow - that code is still structurally similar code to the original Tokenize code.

What happens next depends on "ownership": if the dispatched callee belongs to the package, no worries, it will also get cached.

I wondered if there was a problem with this somehow. Is there an easy way to get visibility onto this part?

timholy commented 1 year ago

@snoopi_deep and ProfileView will make it literally visible. You get one flame per entrance into inference.

KristofferC commented 1 year ago

Tokenize runs with O1 https://github.com/JuliaLang/Tokenize.jl/blob/8d2aa01f8f814de060ff8f80f164cb8909c899c7/src/Tokenize.jl#L4 but I doubt that would cause such a big change.

jakobnissen commented 1 year ago

This has been solved in JuliaSyntax 0.3.5 (thanks to Kristoffer's PRs). Unfortunately, #316 is still relevant, due to invalidations in REPL.jl and Tar.jl. Kristoffer has already fixed more of these in Tar.jl (https://github.com/JuliaIO/Tar.jl/pull/152 ,:heart: btw), but these will not land until Julia 1.10 as Tar.jl is a stdlib. There might be more invalidations present when loading ProgressLogging.