Toxaris / coalgebraic-parsing

0 stars 0 forks source link

Designing a language that supports `delegate` #7

Open b-studios opened 9 years ago

b-studios commented 9 years ago

In order to enable discussion less bound to certain implementation details it would be good to define a formal language (grammar formalism) and give precise semantics for the language elements involved.

Right now it seems, that delegate / delegateOnce is at the core of our framework since it allows for redirecting input to other parsers (which was our original goal).

This issue is about collecting ideas and thoughts on a possible language and its semantics.

b-studios commented 9 years ago

There are two aspects of redirecting input:

  1. Referencing the parser to redirect to
  2. Binding the resulting (partial) parser

As a result the resulting grammar formalism probably needs to be higher order. It should be possible to parametrize rules, rules should return (partially applied) rules. Now: where is the big difference to parser combinators then?

b-studios commented 9 years ago

First ad-hoc proposal: Building on some variant of OMeta which itself builds on PEG, we could introduce the primitive del(p) that feeds the next character of the input stream to p and returns a new parser which reflects p in the new state. OMeta allows parametrizing rules (or nonterminals) over other rules. As mentioned above, it is now also necessary that rules can return rules. Using the existing name binding mechanism interleave could then be implemented as:

interleave p q := del(p):p'  del(q):q' apply(interleave, p', q')

This rule will alternate in applying p and q token wise until either one does not match any more or the input is exhausted. There is no way a successful result is returned. To query a parser for its result we could think of the primitive done(p) which fails if p has no successful result. With done we can implement interleave as:

interleave p q := del(p):p' del(q):q' apply(interleave, p', q')
                | done(p):pRes done(q):qRes -> ... combine pRes and qRes ...

or with p and q being mutable variables as

interleave p q := (del(p):p del(q):q)* done(p):pRes done(q):qRes -> ... combine pRes and qRes ...
b-studios commented 9 years ago

Given negative lookahead ! delegateUntil can be implemented as:

delegateUntil c p := (!c del(p):p)*
Toxaris commented 9 years ago

Would done correspond to my kill combinator?

Toxaris commented 9 years ago

"... over other rules" => should that be "over other symbols"?

Toxaris commented 9 years ago

Another thing we might want to support is to pass a transformed token stream to a nested parser. For example, consider how to support Java-style unicode escapes.

b-studios commented 9 years ago

Would done correspond to my kill combinator?

No, done would return the result once, but could not take any further input.

"... over other rules" => should that be "over other symbols"?

I am not sure what you mean with symbols. In OMeta speech, rules are nonterminal productions. And higher order rules are supported.

Another thing we might want to support is to transform a token stream. For example, consider how to support Java-style unicode escapes.

Yes, I also thought about that one. Probably adding a feed primitive could be enough for that.

b-studios commented 9 years ago

Not only could we handle unicode escapes, but in general implement lexers this way:

lexer :  Parser Token f a => Parser Char f a

A lexer then is a parser that parses a character stream and feeds another parser that parses a stream of tokens.