Closed ndinsmore closed 1 year ago
Nice!! I assume you refer to JuliaLang/julia#48568
Does this also help with the older isascii
in Julia 1.9?
The old isascii could only do about 2 bytes a cycle in <1.10, The new version will do 20+bytes/cycle. The UTF8 code was slow so I am assuming the old version should still offer a good benefit.
The old isascii could only do about 2 bytes a cycle in <1.10, The new version will do 20+bytes/cycle.
Most identifiers are quite short though, and the chunk size in that PR is 1024, so I am not sure you will get out the full speed from it on your average Julia identifier. Would be interesting to benchmark.
I had meant to write on larger strings, none the less it does still speed up things by ~25%. We could likely craft a isascii
that is tailored for smaller strings.
I should note that the benchmarking was also done with https://github.com/JuliaLang/julia/pull/48887 applied. But that shouldn't effect the back to back.
The thing is that when I profiled this utf8proc_decompose
was taking up the majority of the TreeNode
processing time, now it doesn't even show up in the profile.
I totally agree that this thing is the right thing to do. I am just not sure the performance win is due to the improvements of isascii
on master. There might be the same win in 1.9 for example.
I agree this probably works and is beneficial on 1.9. I was just hoping you'd go ahead and do the benchmark for me ;-)
I ask because JuliaSyntax isn't just for Julia 1.10+. It's tested and works all the way back to julia 1.0 as needed to support use in the VSCode tooling.
Anyway, I checked and it's also a big improvement on 1.9:
julia> f = read(joinpath(Sys.BINDIR, Base.DATAROOTDIR, "julia", "base", "abstractarray.jl"), String);
# before
julia> @benchmark JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 360 samples with 1 evaluation.
Range (min … max): 12.632 ms … 22.863 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 12.998 ms ┊ GC (median): 0.00%
Time (mean ± σ): 13.906 ms ± 1.894 ms ┊ GC (mean ± σ): 5.84% ± 9.83%
▅██▇▅▂ ▂
██████▇█▇███▄▄▁▁▁▄▁▁▁▁▁▁▄▁▁▁▁▁▄▇▇▇▇▇▇▇█▆▆▄▆▇█▆▁▆█▁▁▆▆▄▄▁▁▁▄ ▇
12.6 ms Histogram: log(frequency) by time 19.6 ms <
Memory estimate: 8.03 MiB, allocs estimate: 172102.
# after
julia> @benchmark JuliaSyntax.parseall(JuliaSyntax.SyntaxNode,$f)
BenchmarkTools.Trial: 496 samples with 1 evaluation.
Range (min … max): 9.157 ms … 15.936 ms ┊ GC (min … max): 0.00% … 35.99%
Time (median): 9.444 ms ┊ GC (median): 0.00%
Time (mean ± σ): 10.075 ms ± 1.461 ms ┊ GC (mean ± σ): 5.73% ± 10.30%
▄▇█▇▅▃▁
███████▇▇█▅▇▅▅▇▅▄▁▁▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆█▅▇██▆▇▇▁▇▆▁▇▇▆▄▆▄▅▄ ▇
9.16 ms Histogram: log(frequency) by time 14.4 ms <
Memory estimate: 6.53 MiB, allocs estimate: 98538.
Merging #228 (320738a) into main (56c33b2) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## main #228 +/- ##
=======================================
Coverage 96.31% 96.31%
=======================================
Files 15 15
Lines 3913 3913
=======================================
Hits 3769 3769
Misses 144 144
Impacted Files | Coverage Δ | |
---|---|---|
src/literal_parsing.jl | 97.87% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Do you know if the same improvement (automatically) applies in the default Julia parser? Can (and should) it otherwise be added there easily, until yours becomes the default, and maybe even be backported?
Julia's reference parser already uses an equivalent optimization internally (see the C code in - fl_accum_julia_symbol
)
I was trying to use JuliaSyntax is some string benchmarking and noticed a lot of calls to utf8_normalization.
In master
isascii
is now blazing fast, so I thought that checking if a string was ascii before any UTF8 bs would speed things up. Julia code is mostly asciiI benchmarked by parsing base/abstractarray.jl which is the largest file in the codebase.
Before pr:
After PR