erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.8k stars 126 forks source link

Using standard PEG syntax? [ was: make a new release? ] #191

Open mw66 opened 2 years ago

mw66 commented 2 years ago

The last release is:

https://pypi.org/project/parsimonious/ parsimonious 0.8.1 Released: Jun 20, 2018

That's ~4 years ago, can we make a new release?

Thanks.

erikrose commented 2 years ago

Well timed! We have a new one coming very shortly!

mw66 commented 2 years ago

BTW, is there a standard PEG syntax?, e.g. parsimonious use = to define rules: lhs = rhs.

But elsewhere, I saw most people are using <-: e.g.

https://en.wikipedia.org/wiki/Parsing_expression_grammar https://nim-lang.org/docs/pegs.html https://github.com/PhilippeSigaud/Pegged/blob/master/examples/PEG/src/pegged/examples/PEG.d

etc.

I think the benefit of using standard syntax is that users can compare different library using the same grammar file, without have to change the syntax for each library.

Just wondering if we can add <- as alternative to = to define rules (so it's not a breaking change)?

lonnen commented 2 years ago

Historic note for anyone looking at this in the future:

The syntax for Parsimonious uses = to define rules for two reasons:

  1. when Erik started this library, he was implementing from the original paper. Other implementations had no consensus it was unclear that the character was going to catch on as <-
  2. = was chosen because it felt more ergonomic to people comfortable programming in Python

(more about these here)

Now that some time has passed and a consensus is brewing around using <- the only thing keeping it from being added as an alternate syntax is the time and effort to write the patch.

lonnen commented 2 years ago

oh, one small reason to put it off: Parsimonious reverses the precedence of AND/OR compared to other PEG libs. It's a silly barrier, but this fix would give the illusion of compatability that isn't there.

see also:

lucaswiman commented 2 years ago

but this fix would give the illusion of compatability that isn't there.

Is there some reason it would have to be incompatible? It’s just a different grammar, so there could be one grammar parser for parsimonious classic, and one for exact compatibility with other peg parsers.

lonnen commented 2 years ago

I was unclear. Adding <- and an alternative syntax is absolutely feasible to preserve backwards compat within Parsimonious, but if this is done without also fixing AND/OR precedence it will invite dropping in grammar files that work with other parsers, but which will have unexpected behavior with Parsimonious

mw66 commented 2 years ago

Then how about it provides both:

= : keep the old AND/OR precedence rule.

<- : define the new AND/OR precedence rule.

lonnen commented 2 years ago

The current AND/OR precedence is a bug. If we can resolve that, it should be the only behavior. All of this is the say we should fix it before or alongside adding syntax-compatibility with other PEG libraries

It's also been a tricky bug to fix, even when Erik was actively developing this lib. I don't think it is prudent to maintain both behaviors for the sake of backwards compat with pre-1.0 versions of Parsimonious.

That said - there's still no clear owner for the issue. If someone is interested, though, it would be a high utility improvement for Parismonious!

mw66 commented 2 years ago

I don't think it is prudent to maintain both behaviors for the sake of backwards compat with pre-1.0 versions of Parsimonious.

I agree, maybe we need a breaking change version.

BTW, I found a working Python PEG parser here:

https://github.com/we-like-parsers/pegen/blob/main/data/python.gram

It uses ":".

lucaswiman commented 2 years ago

It's also been a tricky bug to fix, even when Erik was actively developing this lib. I don't think it is prudent to maintain both behaviors for the sake of backwards compat with pre-1.0 versions of Parsimonious.

Totally disagree with this. Unless the upgrade path is extremely easy and foolproof (like a function that converts an old grammar string to an equivalent new one), this would be breaking backwards compatibility for pretty questionable reasons: complying with some other parsers used by other people who aren't already using the library.

As a user of (and contributor to) parsimonious, with many functional grammar files, what other libraries are doing isn't very relevant unless it give some genuine functionality improvements. "pre-1.0" is sort of weak, since it is used in a lot of production systems.

The docs are pretty explicit. The README says:

I don't plan on making any backward-incompatible changes to the rule syntax in the future, so you can write grammars with confidence.

The comments on the grammar definition in the code says this: https://github.com/erikrose/parsimonious/blob/b6a6f5402fc370ffaa94dee2fac81ae4e0ab32e6/parsimonious/grammar.py#L216-L219

It does seem like supporting both syntaxes shouldn't be that bad. Ideally the changes would just be to the grammar, though it looks like https://github.com/erikrose/parsimonious/compare/master...lower-precedence-ors required some changes to the visitor as well. Maybe a transducer from the old syntax to the new syntax would be possible, or at least an interesting exercise.

That said - there's still no clear owner for the issue. If someone is interested, though, it would be a high utility improvement for Parismonious!

Now that python 2 support has been dropped 🙌 , I'm personally more interested in working on allowing parsing of bytes objects. However, I'd be very interested in helping to review / test changes to syntax or precedence. I'm very glad that this library is getting more development velocity now that you have commit access!

lonnen commented 2 years ago

@lucaswiman I appreciate the thoughtful comment! Would you mind putting it on #199? I cannot find an original issue for the bug so I've made a new one, and I'd like to keep the discussion in that issue