Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.44k stars 200 forks source link

Custom Token Patterns too inefficient #1783

Closed dhlolo closed 2 years ago

dhlolo commented 2 years ago

Using RegExp as token pattern seems to be fast, but when I use custom_payload function: function matchCustomToken(text, startOffset) { return REG.exec(text.substring(startOffset)); }. It costs about 20s to solve 500 lines, one and a quarter minutes to solve 1000 lines.

bd82 commented 2 years ago

Hello @dhlolo

There are some optimizations that are only performed when no custom tokens are used. However, these should not cause such a large performance difference.

The main thing that could affect the performance in this chase is not automatically using the "starting character optimization" when a custom token is used. See the documentation below how to resolve this:

In general even without these optimizations the numbers you posted seems really high.

bd82 commented 2 years ago

switching to discussion