Open b-studios opened 9 years ago
There are two aspects of redirecting input:
As a result the resulting grammar formalism probably needs to be higher order. It should be possible to parametrize rules, rules should return (partially applied) rules. Now: where is the big difference to parser combinators then?
First ad-hoc proposal: Building on some variant of OMeta which itself builds on PEG, we could introduce the primitive del(p)
that feeds the next character of the input stream to p
and returns a new parser which reflects p
in the new state. OMeta allows parametrizing rules (or nonterminals) over other rules. As mentioned above, it is now also necessary that rules can return rules. Using the existing name binding mechanism interleave
could then be implemented as:
interleave p q := del(p):p' del(q):q' apply(interleave, p', q')
This rule will alternate in applying p and q token wise until either one does not match any more or the input is exhausted. There is no way a successful result is returned. To query a parser for its result we could think of the primitive done(p)
which fails if p
has no successful result. With done we can implement interleave
as:
interleave p q := del(p):p' del(q):q' apply(interleave, p', q')
| done(p):pRes done(q):qRes -> ... combine pRes and qRes ...
or with p
and q
being mutable variables as
interleave p q := (del(p):p del(q):q)* done(p):pRes done(q):qRes -> ... combine pRes and qRes ...
Given negative lookahead !
delegateUntil
can be implemented as:
delegateUntil c p := (!c del(p):p)*
Would done
correspond to my kill
combinator?
"... over other rules" => should that be "over other symbols"?
Another thing we might want to support is to pass a transformed token stream to a nested parser. For example, consider how to support Java-style unicode escapes.
Would done correspond to my kill combinator?
No, done
would return the result once, but could not take any further input.
"... over other rules" => should that be "over other symbols"?
I am not sure what you mean with symbols. In OMeta speech, rules are nonterminal productions. And higher order rules are supported.
Another thing we might want to support is to transform a token stream. For example, consider how to support Java-style unicode escapes.
Yes, I also thought about that one. Probably adding a feed
primitive could be enough for that.
Not only could we handle unicode escapes, but in general implement lexers this way:
lexer : Parser Token f a => Parser Char f a
A lexer then is a parser that parses a character stream and feeds another parser that parses a stream of tokens.
In order to enable discussion less bound to certain implementation details it would be good to define a formal language (grammar formalism) and give precise semantics for the language elements involved.
Right now it seems, that
delegate
/delegateOnce
is at the core of our framework since it allows for redirecting input to other parsers (which was our original goal).This issue is about collecting ideas and thoughts on a possible language and its semantics.