Engelberg / instaparse

Eclipse Public License 1.0
2.74k stars 149 forks source link

Ordered choice operator, grammar fragments and namespacing #139

Open arrdem opened 8 years ago

arrdem commented 8 years ago

Hey,

I don't have a solution to propose, I just thought I'd mention a point of friction I came across recently in using instaparse to manipulate some internal DSLs.

  1. because of the current ordered choice operator /, it isn't possible to take advantage of keyword namespacing when naming grammar terms.
  2. Besides for the :auto-whitespace directive, there doesn't seem to be a good way to break up grammars between several files.

My specific use case was that I had a configuration DSL containing expressions written in two other languages. It would have been awesome if there had been a clear way for me to break the other languages out into their own grammars and use keyword namespacing to make sure that there weren't name collisions. With the advent of clojure.spec this is I suggest likely to be a trend in data design.

Cheers!

Engelberg commented 8 years ago

If a grammar is broken up into multiple files, one strategy would be to process each file with combinators/ebnf, which will build a "grammar map" from each file. These maps can then be merged, and then you can build a parser with core/parser from the grammar map (use :start to identify the starting rule).

I've never thought about namespacing the grammar rule names before, and as you point out, that might be difficult to do while remaining backwards compatible with the current use of /.

One possibility would be to make this an enhancement of the combinators/ebnf function, and allow it to take an optional namespace string. As it builds the grammar map, it could replace all the keywords with namespace-qualified keywords.

That would prevent collisions, but it wouldn't allow you to easily reference a rule from one file from a rule in another file, though.

arrdem commented 8 years ago

Re: ordered choice, would it be possible to parameterize the bnf parser on the ordered choice (or other) operators? I'm thinking something like :ordered-choice #"||"

Re: files, Okay cool. it shouldn't be too hard to put a wrapper together around these supporting loading several files together so long as when the last file is processed all nonterminals resolve. Doesn't matter for my use case since this was some throwaway code, but could be nice for someone else in the future.

Engelberg commented 8 years ago

It's possible to parameterize the ordered choice symbol, but also a little risky, in the sense that the existing grammar for instaparse grammars is well-tested, and if someone inadvertently replaces the ordered choice symbol with something that interacts oddly with the other regular expressions that define the various tokens, weird behavior could result that might be hard to pinpoint.

One possibility to support namespaced rule names would be to special-case things of the form ns/name where there is no whitespace before or after the / and ns is not a valid rule name. I can think of several reasons this strategy is not ideal, but it's the best idea I have so far.

I agree that tying the existing behavior into something that just loads multiple files is not a hard problem. Only non-trivial part would be giving meaningful error messages that pinpoint the correct file in the event that a nonterminal doesn't resolve once things are merged together.

aengelberg commented 8 years ago

One idea: in EBNF syntax, allow rules to be namespaced with a dot separator, which get converted to namespaced keywords for hiccup.

myns.S = 'a'

Parse tree:

[:myns/S "a"]

And potentially allow my.ns.S which gets converted to :my.ns/S.

arrdem commented 8 years ago

I agree that making [^\s]/[^\s] mean something other than \s/\s is going to cause problems. I did a little reading and it looks like / is indeed the traditional ordered choice operator, so I'm not sure what to suggest that would be better. While it gives the finest grained control, letting a user change the ordered choice operator is definitely going to bite someone eventually.

Honestly rather than trying to do something weird with symbol parsing I think it'd make more sense to just give an optional namespace parameter when loading a resource. Keywords from that resource get namespaced, and it defaults to nil which is the namespace of any unqualified keyword anyway.