Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.48k stars 204 forks source link

Feature Request: allow regexp in start_chars_hint #1807

Closed 4silvertooth closed 2 years ago

4silvertooth commented 2 years ago

If we have a custom token matcher or a regexp pattern not supported by regexpt-to-ast like Positive Lookbehind, let's say (?<=═ )[a-zA-Z0-9]+ which matches really long list of chars after the symbol , instead of providing all those chars in an array to start_chars_hint like start_char_hint : [..."abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"] wouldn't it be easy to just provide a regexp start_char_hint: /[a-zA-Z0-9]/ which would be parse-able by regexp-to-ast for optimization, also will be good with pattern fragments.

bd82 commented 2 years ago

While this could be a workaround to issues with regexp-to-ast. I would (if/when) I had the time prefer to improve the original limitations of regexp-to-ast instead of implementing a workaround.

I also think that for some scenarios the consideration to not automatically deduce the start_cart_hints is because of the performance cost of creating very large arrays. So in that situation it is preferable for the hints to be generated by the end user and optionally cached.

Your idea does have merit, but it may not need to be part of the Chevrotain APIs. A small utility (function) which given a regexp returns the possible start chars can be implemented with regexp-to-ast or regexpp or any other regexp parser....