Engelberg / instaparse

Eclipse Public License 1.0
2.74k stars 149 forks source link

Bounded repetitions for EBNF / Ability to read own representation #187

Closed razum2um closed 5 years ago

razum2um commented 5 years ago

How can ABNF's repetitions 1*2 A can be expressed in EBNF? Except A | A A & given regexp is not an option?

E.g. parsed ABNF is printed very natural:

(instaparse/parser "S = 1*2 A
A = '1'" :input-format :abnf)

S = A{1,2}
A = "1"

but it cannot read this representation

(instaparse/parser "S = A{1,2}; A = '1'")

Evaluation error (RuntimeException) at sun.reflect.GeneratedConstructorAccessor13.newInstance (:-1).
1 occurs on the right-hand side of your grammar, but not on the left

p.s. 1 & 2 reps are ok to expand and a simple string is regexp-able, but the example is simplified ofc

Engelberg commented 5 years ago

You are correct that the printed representation of a bounded repetition expressed in an ABNF grammar cannot be read back in.

This is a consequence of a few compromises/decisions that were made along the way. The reasoning went like this:

  1. ABNF standard has bounded repetition, but EBNF has no such notation.
  2. The usual notations already have another meaning in EBNF, such as curly braces {} and *.
  3. So decision was made to not support bounded repetition in EBNF, at least for now.
  4. But once a grammar is processed and turned into a parser, we don't really track whether it came from an ABNF or EBNF grammar.
  5. So printing has to be grammar agnostic -- we chose EBNF as the printing standard because it's more natural.
  6. So, we picked some reasonable notation for bounded repetition, knowing that we can't actually support reading that notation without it potentially being a breaking change (although the odds of it breaking someone's grammar are admittedly low).
  7. Therefore, bounded repetition printing can't be read back in.

My plan is at the next major version upgrade (where low-risk breaking changes would be acceptable), to actually allow notation like A{1,2} to work in EBNF, and then the notation will be readable. [No promises on when I'll have the opportunity to work on the next major upgrade].

For now, if you want bounded repetition in a grammar that is otherwise EBNF, I recommend making most of your grammar in EBNF, and using the ebnf combinator to turn it into a grammar map. Then run your bounded repetition rule through the abnf combinator to turn that into a grammar map, and merge the maps. The merged map can be processed with parser.

It's not as clean a solution as being able to express bounded repetition directly in EBNF, but it's the best option currently available. Hope that helps explain the reasoning, and helps you accomplish what you need.

razum2um commented 5 years ago

Yeah, I definitely see the reason for printing choice, thanks again for merge reminder 👍