Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.44k stars 199 forks source link

Fixes #1982 - skips building errors / CST during backtracking #1983

Open matthew-dean opened 10 months ago

matthew-dean commented 10 months ago

Related to #1982.

I looked at the benchmark_web section, but I couldn't get that to work, and I don't think any of the examples have backtracking anyway.

I'm going to try this with my local parser for performance improvements.

matthew-dean commented 10 months ago

I wonder actually if this would be considered a breaking change, based on this discussion. That is, if someone has modified / overridden backtracking, and is expecting errors to be built, this might break that?

matthew-dean commented 10 months ago

I will close this as I got a local solution that does the exact same thing just extends the CstParser

bd82 commented 10 months ago

I will close this as I got a local solution that does the exact same thing just extends the CstParser

Why? I have not looked into this yet but from the title it seems like a good generic approach.

I hope to evaluate this on the weekend....

matthew-dean commented 10 months ago

@bd82 Oh! Well, it does work! It basically does the following when backtracking:

  1. Skips cloning (or restoring errors)
  2. Skips saving errors, which involves serialization, so a tiny optimization there.
  3. Skips creating or modifying a CSTNode

To be honest, part of the reason I closed it is because part of the huge slow-down was not just backtracking, but in poor ALT performance of chevrotain-allstar when using recursive ALT blocks.

But, when doing the above 3 steps, then what you're left with is just, essentially, a structured lookahead.

If, though, in addition to this, a successful BACKTRACK skipped a matching ALT (an ALT with the same SUBRULE) when successful (which IMO is somewhat trivial to implement), then you could remove the part in the documentation that strongly recommends against backtracking, because IMO it would have almost no performance hit whatsoever. But you seemed to be against this in this discussion.

(Note: these approaches might need to be somewhat exclusive, because you cannot avoid building a CST if you want to then cache it and return it with a matching SUBRULE.)

So, because you seemed somewhat against making BACKTRACK performant, I figured maybe I should close this. But if the current files in this PR or the above approach seems compelling to you, let me know! I'm happy to make backtracking super-fast and useful, but if you're philosophically against backtracking, I wasn't sure I should invest much time into it.

bd82 commented 10 months ago

but if you're philosophically against backtracking, I wasn't sure I should invest much time into it.

I prefer to avoid backtracking where possible, but if there is an easy win to improve performance we could consider it. If I remember correctly the Java Parser is stuck on an old version of Chevrotain

jtkiesel commented 8 months ago

A certain set of Java files currently takes ~2500ms to be parsed by the Java Parser on Chevrotain 6.5.0 when run on my machine. After upgrading to Chevrotain 11.0.3, those same files took ~4300ms to parse (~72% slower). I made the changes proposed in this PR to the Java Parser directly, and parsing those files now only takes ~2750ms (only ~10% slower than on Chevrotain 6.5.0, and ~56% faster than on Chevrotain 11.0.3 without these changes).

In order to avoid having to copy/paste ~200 lines of code from Chevrotain to the Java Parser, it would be great if these changes could be merged into Chevrotain itself. Let me know if there is anything I can do to help this PR along!

bd82 commented 5 months ago

I wonder:

  1. Could I remove built-in back-tracking support in Chevrotain?
  2. Could back-tracking be implemented as an optional plugin (same as LL-star lookahead).

Basically It was always recommended to avoid back-tracking to avoid back-tracking due to both performance reasons and it did not always align smoothly with other features.

Nowadays it seems that many of the use cases for back-tracking can be filled via: LL-* lookahead.

@jtkiesel What scenarios in the Java-Parser still require backtracking? could you point to one of the simpler ones?

bd82 commented 5 months ago

Backtracking as a Plugin details

It seems many of the changes (at least in the PR) are either:

  1. at the start of a method
  2. at the end of a method
  3. conditionally disable a sub-method call from a method.

It seems to me that most cases would be easy to implement (without copy/pasta) in an extending class

@matthew-dean @jtkiesel WDYT? WDYT.

I have to admit I did not yet have a look at the existing backtracking related code in Chevrotain Only the PR code.