Closed tmbb closed 4 years ago
Is the compilation issue caused by repeating the same patterns over and over again? NimbleParsec does give the trade-off of increasing runtime by compiling subpatterns into defparsecs. Would that help?
The "utf8 modified" benchmark is the one with all the tricks to make it faster... Including dividing the enormous character patterns into several parsecs.
I see :( honestly it feels 40 seconds is a bit too much. Since we can assume mostly valid Elixir code, perhaps it makes more sense to parse everything until it is something that is not a dot, parens, space, plus, etc?
I see :( honestly it feels 40 seconds is a bit too much.
It's 95 seconds when I add proper unicode support for atoms. I can't even reuse the same parsecs because they are subtly different (i.e. an atom supports a slightly different set of characters from variables).
Since we can assume mostly valid Elixir code, perhaps it makes more sense to parse everything until it is something that is not a dot, parens, space, plus, etc?
I don't know, we could just as well assume that most identifiers will have ASCII characters and revert to the previous simpler implementation.
Yeah, I would go or with ASCII or use exclude the other valid characters as mentioned above. Excluding is fine because Elixir will reject anything else anyway. I think having to wait 2 minutes for publishing the docs of your first project is definitely too much. Another idea is to implement regex support, as you mentioned in the forum, and use regexes for this particular task.
Ok, it turns out that compilation time is ~35s and not 90s (I have no idea why I got that freak result before). I've benchmarked three implementations:
unicode_set
to generate exactly the set of characters supported by ElixirBenchmarks below (courtesy of my schism library, benchee and the benchee markdown format):
Benchmark run from 2020-09-30 11:07:35.179159Z UTC
Benchmark suite executing on the following system:
Operating System | Linux |
---|---|
CPU Information | Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz |
Number of Available Cores | 8 |
Available Memory | 7.87 GB |
Elixir Version | 1.10.4 |
Erlang Version | 23.0.3 |
Benchmark suite executing with the following configuration:
:time | 5 s |
---|---|
:parallel | 1 |
:warmup | 2 s |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Lexer performance - ascii only | 137.10 | 7.29 ms | ±7.37% | 7.23 ms | 9.17 ms |
Lexer performance - parsec | 132.52 | 7.55 ms | ±8.50% | 7.55 ms | 8.98 ms |
Lexer performance - regex | 0.0717 | 13949.59 ms | ±0.00% | 13949.59 ms | 13949.59 ms |
Comparison
Name | IPS | Slower |
---|---|---|
Lexer performance - ascii only | 137.10 | |
Lexer performance - parsec | 132.52 | 1.03x |
Lexer performance - regex | 0.0717 | 1912.48x |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Lexer compilation time - ascii only | 0.0586 | 17.07 s | ±0.00% | 17.07 s | 17.07 s |
Lexer compilation time - regex | 0.0549 | 18.21 s | ±0.00% | 18.21 s | 18.21 s |
Lexer compilation time - parsec | 0.0271 | 36.90 s | ±0.00% | 36.90 s | 36.90 s |
Comparison
Name | IPS | Slower |
---|---|---|
Lexer compilation time - ascii only | 0.0586 | |
Lexer compilation time - regex | 0.0549 | 1.07x |
Lexer compilation time - parsec | 0.0271 | 2.16x |
The benchamarks use my schism
library to easily recompile the code at runtime according to the version we want.
In the new branch, there are two places where conditional compilation is used:
and this one: https://github.com/elixir-makeup/makeup_elixir/blob/utf8-identifier-benchmarks/lib/makeup/lexers/elixir_lexer.ex#L66-L94
Let's dissect the first of these places:
# Different implementations of the `variable_name` combinator
schism "parsec vs regex" do
dogma "parsec" do
defparsecp(:variable_start_chars, utf8_char(Utf8Utils.variable_start_chars()))
defparsecp(:variable_continue_chars, utf8_char(Utf8Utils.variable_continue_chars()))
variable_name =
parsec(:variable_start_chars)
|> repeat(parsec(:variable_continue_chars))
|> optional(utf8_char(Utf8Utils.variable_ending_chars()))
end
heresy "regex" do
defregexparsec(
:variable_name,
# Not 100% correct, but a pretty good approximation
~R/[_\p{Ll}\p{Lm}\p{Nl}][_p\{L}\p{Mn}\p{Mc}\p{Nd}\p{Pc}]*[\?!]?/
)
variable_name = parsec(:variable_name)
end
heresy "ascii only" do
variable_name =
ascii_string([?a..?z, ?_], 1)
|> optional(ascii_string([?a..?z, ?_, ?0..?9, ?A..?Z], min: 1))
|> optional(ascii_string([??, ?!], 1))
end
end
In the "parsec"
dogma (the default version, that's why it's the dogma), we use the Utf8Utils.variable_start_chars/1
and Utf8Utils.variable_continue_chars/1
to return the exact set of characters supported by Elixir (these functions return a static list of characters; they are generated by a script that queries the unicode_set
package but they don't actually depend on this package at runtime). They are encapsulated in a parsec so that NimbleParsec
doesn't waste any time trying to optimize them. The combinator is simply a literal conversion of the BNF grammar in the Elixir syntax description.
In the "regex"
heresy (a heresy means that it will only be compiled if we set the appropriate configuration option - in schism
parlance we convert/1
to a belief). It uses my custom defregexparsec
macro to generate a NimbleParsec
-compatible function that behaves like a parsec. Is you pay attention you'll see that the character classes are not exactly the same as the one Elixir supports, but it«s close enough for something like Makeup. I believe the defregexparsec
is as optimized as possible without writing my own regex engine, but still, it turns out that the erlang :re
module is quite slow, which is understandable, as it's not optimized for the kind of context-free-parsing with limited lookahead and no lookbehind that NimbleParsec
excels at.
In the "ascii only"
I implement the current version of the lexer, which supports only a very limited set of characters. It's fast in terms of both compilation and lexer performance.
It's quite interesting that although the "parsec"
version takes a long time to compile, it's not actually any slower than the version that only recognizes ASCII characters. This probably means that the BEAM pattern matcher is very smart and very fast, and it takes as long to match a very limited set of patterns as it takes to match a huge number of patterns. It also menas tha NimbleParsec
is very good at leveraging the BEAM's pattern matcher to make code run fast.
I've tried all kinds of tricks to reduce compilation time for the "parsec"
version, but it's not been easy... It looks like the compilation time is proportional to the number of function heads. I'll try to write a custom parsec that defers some of these tests to runtime, but it won't be easy and we might find it has important performance penalties.
The "regex"
parser's performance is unacceptable. Unless I can make it frastically faster it's not usable for makeup.
The "parsec"
parser is fast, but if we decide that 35s for compilation is too much, it's stil not usable and we must keep the current version.
This code takes 12 seconds to compile for me:
defparsecp(:variable_start_chars, utf8_char(Utf8Utils.variable_start_chars()))
defparsecp(:variable_continue_chars, utf8_char(Utf8Utils.variable_continue_chars()))
defparsec :variable,
parsec(:variable_start_chars)
|> repeat(parsec(:variable_continue_chars))
|> optional(utf8_char(Utf8Utils.variable_ending_chars()))
defparsecp(:atom_start_chars, utf8_char(Utf8Utils.atom_continue_chars()))
defparsecp(:atom_continue_chars, utf8_char(Utf8Utils.atom_continue_chars()))
defparsec :atom,
parsec(:atom_start_chars)
|> repeat(parsec(:atom_continue_chars))
|> optional(utf8_char(Utf8Utils.atom_ending_chars()))
I have an idea. What if instead of using parsec
to build these large utf8 combinators, we use repeat_while
to see if we should continue parsing? It would be something like this:
repeat_while(utf8_char([]), :atom_continue)
defp atom_continue(rest, context, _, _) do
if Utf8Utils.atom_continue?(rest), do: {:cont, context}, else: {:halt, context}
end
And atom_continue would compile to a bunch of clauses like this:
def atom_continue?(<<?a::utf8, _::binary>>), do: true
def atom_continue?(<<?b::utf8, _::binary>>), do: true
...
def atom_continue?(_), do: false
Which should be faster to compile. We can use the repeat_while approach for everything except variable_start. Thoughts?
I have also noticed sigils are expensive to compile, partly due to generating a clause for each possible sigil delimiter. Should we instead generate sigils like this:
start = choice([""", "(", ...])
stop = choice([""", ")", ...])
And then use something like post_traverse to actually verify that they match?
have an idea. What if instead of using parsec to build these large utf8 combinators, we use repeat_while to see if we should continue parsing? It would be something like this:
I'm exploring that as well. But I don't get why I'm getting such randomly different compile times in different test runs... Sometimes I get compile times around 17s (in the same ballpark as what you get), and sometimes as high as 80s. What are the specs of your machine?
And then use something like post_traverse to actually verify that they match?
I've thought of that already, but I havent tried it yet.
And atom_continue would compile to a bunch of clauses like this:
def atomcontinue?(<<?a::utf8, ::binary>>), do: true def atomcontinue?(<<?b::utf8, ::binary>>), do: true ... def atomcontinue?(), do: false
Which should be faster to compile. We can use the repeat_while approach for everything except variable_start. Thoughts?
I'm exploring ways of matching the character in "user code" using a binary search tree instead of multiple function heads. It's probably faster to compile and maybe it won't be much slower.
I'm exploring ways of matching the character in "user code" using a binary search tree instead of multiple function heads. It's probably faster to compile and maybe it won't be much slower.
That's what the Erlang VM does and it does it really well. So I would just go with the multiple clauses (or a large case
which will be considerably faster to compile).
EDIT:
def atom_continue?(binary) do
case binary do
<<?a::utf8, _::binary>> -> true
<<?b::utf8, _::binary>> -> true
_ -> false
end
end
Are you working on my new branch? The one that uses Schism?
Could you please run the following command: mix run benchee/schism.exs
and post the console output here? One of the benchmaks recompiles the lexer file and measures it.
So that we are measuring the compilation time the same way. Maybe you're just using a faster machine, or maybe its the fact that I'm running elixir on WLS (windows subsystem for linux).
Benchmark run from 2020-10-01 14:32:36.727146Z UTC
Benchmark suite executing on the following system:
Operating System | macOS |
---|---|
CPU Information | Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz |
Number of Available Cores | 8 |
Available Memory | 16 GB |
Elixir Version | 1.12.0-dev |
Erlang Version | 22.3.4.1 |
Benchmark suite executing with the following configuration:
:time | 5 s |
---|---|
:parallel | 1 |
:warmup | 2 s |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Lexer compilation time - ascii only | 0.0749 | 13.35 s | ±0.00% | 13.35 s | 13.35 s |
Lexer compilation time - regex | 0.0683 | 14.64 s | ±0.00% | 14.64 s | 14.64 s |
Lexer compilation time - parsec | 0.0346 | 28.87 s | ±0.00% | 28.87 s | 28.87 s |
Comparison
Name | IPS | Slower |
---|---|---|
Lexer compilation time - ascii only | 0.0749 | |
Lexer compilation time - regex | 0.0683 | 1.1x |
Lexer compilation time - parsec | 0.0346 | 2.16x |
That's still a lot slower than what 12s and more similar to my times. Did you have many other programs running at the same time? I've found out it makes compilation time slower in my laptop.
@tmbb oh, to be clear, I meant to say that only the code snippet I posted above takes 12 seconds and that is 40% of the total time. So if we can take 7 seconds out of it, then that will be a major improvement altogether. :)
@tmbb so I decided to try my approach out and once I transformed utf8_util into a bunch of conditionals, then it took 10s to compile it while the lexer took about 15 seconds. That's when I realized: we can speed up simply by making the compilation parallel into multiple modules!
I also noticed that your generation script can be written with nimble_parsec and we can bypass the intermediate module altogether by using nimble_parsec.compile
. I pushed all of this work here:
On my machine it takes about 16 seconds (from 27 compared to your unicode branch) and that will hold true if you have 2+ cores. It depends on NimbleParsec master which I pushed in a rush. I have to go back there, add some tests, and do a new release.
If we improve the sigil handling, we may make this go under 10 seconds. Alternatively we can move the sigil handling to a separate module too, using the features from NimbleParsec master.
@tmbb so I decided to try my approach out and once I transformed utf8_util into a bunch of conditionals, then it took 10s to compile it while the lexer took about 15 seconds.
I confirm that it's faster that way.
I also noticed that your generation script can be written with nimble_parsec and we can bypass the intermediate module altogether by using nimble_parsec.compile.
This is very cool. I had totally forgotten about nimble_parsec.compile
. It's probably the way to go.
Alternatively we can move the sigil handling to a separate module too, using the features from NimbleParsec master
Why do you need NimbleParsec master?
EDIT: there's a mistake in the atom unicode patterns which I'll fix. Don't merge your branch right now.
You can only call parsec for functions defined in the same module. Master allows public combinators and remote combinators.
@tmbb please Do any improvements You want and merge my branch at your discretion!
PR sent here: #15. Feel free to do your changes and merge. :)
For these benchmarks I've done the following changes (sorry for mixing them all in the same commit, but I just went along with the changes and when I noticed I had this):
nimble_parsec/master
schism
, which allowed me to compare the runtime and compilation speed of both implementationsBenchmark suite executing on the following system:
Operating System | Linux |
---|---|
CPU Information | Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz |
Number of Available Cores | 8 |
Available Memory | 7.87 GB |
Elixir Version | 1.10.4 |
Erlang Version | 23.0.3 |
Benchmark suite executing with the following configuration:
:time | 5 s |
---|---|
:parallel | 1 |
:warmup | 2 s |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Lexer performance - external | 146.39 | 6.83 ms | ±6.48% | 6.82 ms | 8.26 ms |
Lexer performance - internal | 141.31 | 7.08 ms | ±4.55% | 7.02 ms | 8.39 ms |
Comparison
Name | IPS | Slower |
---|---|---|
Lexer performance - external | 146.39 | |
Lexer performance - internal | 141.31 | 1.04x |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Project compilation time - external | 0.159 | 6.28 s | ±0.00% | 6.28 s | 6.28 s |
Project compilation time - internal | 0.0738 | 13.55 s | ±0.00% | 13.55 s | 13.55 s |
Comparison
Name | IPS | Slower |
---|---|---|
Project compilation time - external | 0.159 | |
Project compilation time - internal | 0.0738 | 2.16x |
In the "internal"
case the sigils are compiled in the ElixirLexer
module. In the "external"
case they are compiled in a different module, which allows the parallel compiler to distribute the compilation work between several cores. This leads to a very large peformance impact during compilation (compilation time is reduced by 50%)
There is no performance penalty for splitting the sigils into another module (I start to think that for complex parsers there isn't such a great performance advantage of using combinators instead of parsecs).
Splitting the sigils comes with some code organization challenges, because inside a sigil you can have pretty much any element from the lexer because of interpolation. To handle that I had to create some public combinators in the ElixirLexer
module which are invoked inside the Sigils
module. At the same time, the ElixirLexer
invokes a combinator from the Sigils
module. Because these dependencies only matter at runtime and not at compile time, this is not such a big deal, but it's inelegant. Despite the lack of elegance, the performance impact is so big that I chose to keep it.
I didn't split the Atom
and Variable
modules further because it didn't result in further compilation speed improvements (the relevant schisms have been deleted, otherwise the source would have become really confusing).
These benchmarks were done with Schism
but for the final release and before merging into master I will remove the dependency on Schism
. Sadly I haven't found a way of making Schism
a :dev
-only dependency.
EDIT: just to give credit where it's due for those that might be following, the big performance improvements all come from @josevalim's idea of splitting the code into several modules to leverage the Elixir parallel compiler.
Awesome @tmbb! After my PR #14 was merged, I don't think it is worth splitting the sigils because they compile very quickly, so I would keep atoms, variables, and the main lexer as the modules.
Given the slowest of them is the variables one, which takes 7 seconds, it will be the one driving the compilation time. So unless we break the variable one apart in two modules, the atom one in two modules, there is no benefit to break the main lexer, as compilation will still take 7 seconds.
So what do you think merging #15 and then you push your improvements on top?
(I start to think that for complex parsers there isn't such a great performance advantage of using combinators instead of parsecs).
There is a benefit, which is less memory allocation, but unlikely to matter unless we are handling dozens of MBs per tokenization, which doesn't matter for makeup.
Awesome @tmbb! After my PR #14 was merged, I don't think it is worth splitting the sigils because they compile very quickly, so I would keep atoms, variables, and the main lexer as the modules.
I don't get it. I'm just showing you a benchmark that compares compilation time between splitting the sigils and keeping them in the main order and I get performance benefits on my machine.
I don't think you can compare the 7s on your machine to the 7s on my machine, right?
If you give me some time I can setup some shchism benchmarks comparing splitting the Variable and Atom modules but I don't see why you think splitting the sigils has no effect
I don't get it. I'm just showing you a benchmark that compares compilation time between splitting the sigils and keeping them in the main order and I get performance benefits on my machine.
I am saying it because I tried this approach in my machine locally and it didn't yield any result. At least for me, the variables module takes the longest to compile, so optimizing any other module will not make the overall compilation faster because everyone ends up waiting on the variables module anyway. :)
In any case, what do you think about merging the current PR, and we do further benchmarks and improvements on top of it? Unless you think the current version is not acceptable and we need to improve it before merge. I will also release a new nimble_parsec. WDYT?
what do you think about merging the current PR, and we do further benchmarks and improvements on top of it?
Are you talking about meeting the version with the split the sigils or the one with sigils in the main leader module?
@tmbb I am talking about PR #15 :)
Btw, do you have a branch with your sigils schism?
Btw, do you have a branch with your sigils schism?
Yes, this one: https://github.com/elixir-makeup/makeup_elixir/tree/origin/jv-speed-up-unicode
Sorry for the weird name, it was a mistake when setting the upstream branch and I didn't want I to mess with the history too much. You can reproduce the benchmarks using that branch.
Cool! Since it is built on top of #15, I will go ahead and merge #15 and we can continue exploring in other branches. :D
Here are some tests that I did.
$ time mix run lib/makeup/lexers/elixir_lexer.ex
takes 4.8 seconds. If I comment out the sigils line, it takes 2.9 seconds. So that's a good improvement but it doesn't speed up compilation because variables.ex take 7 seconds. If I break variables apart, then splitting the sigils out help, because "variable_continue" take 4 seconds, but this will only be visible in machines with 4+ cores.
One thing to note is that you are running on Erlang 23.0 and it had some regressions in compilation times. Those have been fixed on 23.1. In particular, trying out the OTP master branch (with JIT), the variable module compiles in 4.5 seconds and elixir_lexer.ex compiles in 4 seconds and the whole project takes 4.5 seconds.
With all this new data, my honest opinion is to ship as is. I don't think it is necessary to split either variables.ex or sigils apart. But I will be glad to go with whatever you prefer. :) I will release a new nimble_parsec so we can ship a new makeup_elixir release anytime you want.
I find it very strange that splitting the sigils doesn't help to increase performance on your machine. In my machine it's the only change that actually speeds up compilation. Splitting the Variables
and the Atom
modules hasn't actually sped up anything. I've setup a Schism benchmark that tries all 4 posible combinations (sigils in the lexer module or in their own module and identifiers -variables and atoms - split into 2 or 4 modules). I have to add the possibility of testing all possible schism/belief combinations, because the code the code that generates all combinations is not exactly trivial.
I think the performance differences we're seeing are related to the different OTP versions. I can't think of any other probable cause. Clearly in my setup, the ElixirLexer
is the bottleneck (and thus splitting the sigils into their own module makes a big difference) while in your it's the Variable
module (and that's why splitting the sigils doesn't lead to a performance increase in your case). I have to setup Erlang 23.1 and try it for myself.
Meanwhile, I think you can ship as is. The complexity of splitting the sigils is probably not worth it if it doesn't lead to a consistent performance improvement. I'll continue to try to find ways of optimizing it further (and in the meantime update my Schism library with some utilities I've found useful during our exploration)
Benchmarks here:
Benchmark run from 2020-10-02 11:56:33.665866Z UTC
Benchmark suite executing on the following system:
Operating System | Linux |
---|---|
CPU Information | Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz |
Number of Available Cores | 8 |
Available Memory | 7.87 GB |
Elixir Version | 1.10.4 |
Erlang Version | 23.0.3 |
Benchmark suite executing with the following configuration:
:time | 5 s |
---|---|
:parallel | 1 |
:warmup | 2 s |
Run Time
Name | IPS | Average | Devitation | Median | 99th % |
---|---|---|---|---|---|
Project compilation time %{"sigils" => "external", "variables and atoms" => "together"} | 0.149 | 6.69 s | ±0.00% | 6.69 s | 6.69 s |
Project compilation time %{"sigils" => "external", "variables and atoms" => "split"} | 0.147 | 6.80 s | ±0.00% | 6.80 s | 6.80 s |
Project compilation time %{"sigils" => "internal", "variables and atoms" => "together"} | 0.0685 | 14.60 s | ±0.00% | 14.60 s | 14.60 s |
Project compilation time %{"sigils" => "internal", "variables and atoms" => "split"} | 0.0641 | 15.60 s | ±0.00% | 15.60 s | 15.60 s |
Comparison
Name | IPS | Slower |
---|---|---|
Project compilation time %{"sigils" => "external", "variables and atoms" => "together"} | 0.149 | |
Project compilation time %{"sigils" => "external", "variables and atoms" => "split"} | 0.147 | 1.02x |
Project compilation time %{"sigils" => "internal", "variables and atoms" => "together"} | 0.0685 | 2.18x |
Project compilation time %{"sigils" => "internal", "variables and atoms" => "split"} | 0.0641 | 2.33x |
Meanwhile, I think you can ship as is.
Do you want me to ship it? I think you should ship it. :)
FWIW, I just tried 23.0.1 and you are correct that elixir_lexer takes more time there (8s) so the benefits of removing sigils is a bit clearer but variables.ex still takes longer (8.5s). Given everything is much better anyway on next OTP, I would keep it as is. :)
Now that we depend on NimbleParsec 1.1 should we depend on it directly or release a new makeup version that depends on NimbleParsec 1.1 and then depend on the new makeup version? I think we should depend directly on NimbleParsec 1.1 (as we're doing now)
I don't think we need a new makeup
release as it is ok for us to impose a more strong requirement than our parent (i.e. makeup). :)
While adding tests for unicode identifiers I found a bug in makeup. Fortunately, the bug is undone when converting tokens into HTML. The bug leads to an illegal unicode string in tokens that use the lexeme
combinator. This ilegal unicode string is converted into a legal one when outputing HTML. The bug was caught by my tests, though, because it causes an error when I try to convert the token values into strings for better visual inspection.
So, I've released a new version of Makeup with the bug fixed and published this MakeupElixir version o hex.
The naïve way of adding unicode support almost doubles the lexer's compilation time. Depending on the concrete implementation, lexer compilation goes from 20s (which is already a lot) to 32 or 38s.
Is this an acceptable price for a more correct implementation? It's possible that there exists a more efficient way and I've just not found it.