Closed julienmalard closed 2 years ago
Hi Julien, I was where not so long ago and made a PR then shortly after discovering it had already been tried and rejected.
Hello @buhl ,
Thanks for letting me know...@erikrose, any chance you might consider this? I see from the other pull request, though, that you prefer to discourage the use of regexes in grammars in the future. Would you happen to have a recommendation as to how the use case suggested above (\w
and p{Mn}
) could be implemented according to best practices?
I had considered manually generating a list of acceptable characters (see here), but was not sure whether that would be considered best practice or not.
Thanks!
This might not be the right place, but @erikrose could you please document the meaning of ilmsuxa
for regexes? Are those the single-letter flags documented by the regex module?
@jenstroeger when in doubt, file a new issue. They're cheap. This time there's no need, however, since I addressed this with 6e52df0
Some more background for the original issue:
According to https://bugs.python.org/issue12731, it looked like regex
was going to replace re
in the standard lib for 3.3, then maybe it was going to be added as an alternative, but then the issue fell of the radar for 9 or 10 years. It seems like this may still happen in the future, and regex
is still actively developed and generally less uncompliant with unicode.
I understand @erikrose has (had?) other ambitions for Parsimonious, but unless a new contributor appears I don't expect to realize the dream of removing regular expressions anytime soon. Given this, pragmatism pushed me in favor of replacing re
with regex
until a suitable champion for the alternative appears.
I sat on it another day and decided to ahead with it. @Oderjunkie helpfully had a PR already up so I rebased that, updated the version, and squashed it. This should be resolved now by c0ffc3eae32b9c31b65f959ae75d493de828e170
I still think it is valuable, long term, to achieve the removal of regexs as Erik describes here: https://github.com/erikrose/parsimonious/pull/149#issuecomment-449662784. This change is a pragmatic compromise to improve Parsimonious today while there are no alternatives to using regular expressions for some common use cases
thanks everyone!
Many thanks!
Hello,
While using parsimonious I came across an already-documented bug in the builtin re module, whereby abugidas with vowel marks (that is, most of South Asia and Oceania) fail to match the
\w
directive (e.g., as inre.match("^[\w\s][\w\s]*", "किशोरी", re.UNICODE)
, as taken from this question here). In my particular case, this bug crashes my system dynamics models when I run them in PySD.It seems that the recommended "solution" is to simply used the
regex
instead ofre
. However, this would mean adding a dependency toparsimonious
, so I thought I would ask whether you would look favourably upon such a suggestion before submitting a pull request. I was thinking of proposing something along the lines of:so that nothing breaks if the dependency is not installed.
Please do let me know. Thanks! -Julien Malard