erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

requires regex package #231

Closed p0las closed 1 year ago

p0las commented 1 year ago

parsimonious states to be: "Parsimonious aims to be the fastest arbitrary-lookahead parser written in pure Python" since it requires regex package that needs to be compiled then it is no longer pure python.

pavel-kirienko commented 1 year ago

Since the new regex package is mostly API-compatible with the standard re module, would it be an acceptable solution to make Parsimonious default to the built-in module if the new one is not available? E.g.:

https://github.com/erikrose/parsimonious/blob/d5636a6ae4d7fe2ddb96f567e289ab3eeb454b49/parsimonious/expressions.py#L11

- import regex as re
+ try:
+     import regex as re
+ except ImportError:
+     import re
lucaswiman commented 1 year ago

Environments which don't have wheels for regex are pretty uncommon. Is the motivation just to maintain "pure Python" for its own sake or is there an underlying use case?

Given that the core team has said they aren't interested in even fixing bugs in the re module, I'd tend more towards updating the docs than removing the dependency. https://lwn.net/Articles/885682/

lucaswiman commented 1 year ago

If there is a way to tell pip "install regex if it is supported on this platform, otherwise don't", the conditional import would make more sense. Is there a way to specify that in the package file?

p0las commented 1 year ago

we use more than a dozen versions of python in production. Anything that needs to be compiled (installed per version) makes maintaining the system more complex and we try to avoid it as much as possible. if we were looking to incorporate parsimonious into our pipeline right now we would have rejected it because of its dependency. Since we already have it, we opted to use an older version, for now, that has no dependency on regex. I like Pavel's suggestion to make it optional if possible. My understanding is that regex is faster than standard re module but that is the only/main difference.

p0las commented 1 year ago

If there is a way to tell pip "install regex if it is supported on this platform, otherwise don't", the conditional import would make more sense. Is there a way to specify that in the package file?

it is not that it is not supported on our platform but rather it is undesired as explained above.

lucaswiman commented 1 year ago

With wheels, it's pretty rare to need to compile anything, and regex's maintainers are diligent about putting out wheels for many platforms. This sounds like an outdated concern unless you're using jython, ironpython, micropython or a version of cpython that parsimonious no longer supports. I'd be amenable to a PR which excludes the dependency on non-cPython platforms and adds a conditional import to use the standard library.

My understanding is that regex is faster than standard re module but that is the only/main difference.

That's not correct. It has new features and bug fixes as well. And as I said, it is actually maintained, which the standard library version is not. I think all libraries which make nontrivial use of regular expressions should switch to using regex, as recommended by core maintainers in the article I linked to above.

pavel-kirienko commented 1 year ago

@lucaswiman would you be open to accepting a pull request that implements conditional import without altering the dependency specifications? This is beneficial for projects that vendor this library directly in its source form. The presence of a non-pure-Python dependency is somewhat at odds with the statement that Parsimonious itself is a pure-Python solution.

lucaswiman commented 1 year ago

would you be open to accepting a pull request that implements conditional import without altering the dependency specifications?

Sure.

The presence of a non-pure-Python dependency is somewhat at odds with the statement that Parsimonious itself is a pure-Python solution.

For my edification, could you explain why that is still important? My inclination would be to remove the claim to being a pure-Python library, since in my experience it has not mattered for years. When parsimonious was created, including C dependencies was a big pain, but wheels have made it mostly painless.

p0las commented 1 year ago

thank you @pavel-kirienko for fixing it. appreciate it mate.

@lucaswiman

For my edification, could you explain why that is still important? My inclination would be to remove the claim to being a pure-Python library, since in my experience it has not mattered for years. When parsimonious was created, including C dependencies was a big pain, but wheels have made it mostly painless.

we use a small pile of applications (maya, houdini, nuke, and so on, the list gets long) that, in most cases, use different versions of python. on top of that we have our facility python version so we can easily be using a dozen different pythons at one time. with the pure python packages we can deploy them once and share. the moment we need something compiled we have to deploy it for every python version we use and redeploy it each time one of the applications gets updated. This means we are very reluctant to use compiled python packages, even if it is easy to install them with pip. I hope this maked it clearer why it was important to us to keep parsimonious as pure python. cheers.