Engelberg / instaparse

Eclipse Public License 1.0
2.74k stars 149 forks source link

Is there proper way to do exclusion in a CFG? #213

Closed GlassGhost closed 2 years ago

GlassGhost commented 2 years ago

This isn't really an issue so much as a cry for help, but perhaps something could be said in the readme on exclusion if there is an answer to this question

A short introduction to the Lambda Calculus by Achim Jung listed a less ambiguous grammar for the lambda calculus

Without a strict grammar the term \x.x y could be parsed as With auxiliary brackets, the 2 possible interpretations can be indicated by writing \x.(x y) and (\x.x) y, respectively. According to convention, only the 1st interpretation should be possible.

;lambda.abnf_____________________________________________Lambda Calculus RFC2234
;(<Variable> / "(" <fun> ")") replacing <head−atom>
;and <Variable> replacing x from Achim Jung's grammar
    <term> = <fun> / <app> / <atom>
     <fun> = <LAMBDA>, <Variable>, ".", <term>
     <app> = (<Variable> / "(" <fun> ")"), <atom> / <app>, <atom>
    <atom> = (<Variable> / "(" <fun> ")") / "(" <app> ")"
<Variable> = *((<AllUTF8Chars_Except_EscapableChars>) - <EscChars>)
<EscChars> = "." / "(" / ")" / <LAMBDA> / <WSP>
  <LAMBDA> = %5C     ; \ for lambda should be %x03BB but this is a tutorial
     <WSP> = *( %x20 / %x0A / %x09) ;space, linefeed, or HTAB

I wrote grammar to parse the lambda cal in abnf, except for one thing:

<Variable> = *((<AllUTF8Chars_Except_EscapableChars>) - <EscChars>)

This line uses a exclusion operator which to my knowledge makes the parser no longer a CFG

GlassGhost commented 2 years ago

I found this gem that might be the answer I'm looking for-->Exclusion in a context-free language?

Here is the quote, but I'm having trouble understanding it:

I am learning automata theory, and I am confused about this exercise:

Give context free grammar to create the following language where the input alphabet is ${a,b}$

$L = {w \text{ where }w\text{ is NOT of the form }b^n a^n}$

So, I understand how to create ${b^n a^n, n\in\mathbb{N}}$:

$$S \to bSa \mid \epsilon$$

But I am lost as to how to generate a language that is ${w \text{ where }w\text{ is NOT of the form }b^n a^n}$. Help would be appreciated.

A word in $L$ must either be of the form $b^i a^j$ with $i \neq j$ or it must have $ab$ as a substring. Therefore you can write a context-free grammar $G$ for $L$ as the union of two context-free grammars $G_1$ and $G_2$ for $L_1 = {b^i a^j \mid i \neq j}$ and $L_2 = {w \in {a,b}^* \mid ab \mbox{ is a substring of } w}$.

It is easy to come up with a context-free grammar $G_2$. In fact, it is even easy to come up with a regular expression for $L_2$: $(a \mid b)^ab(a \mid b)^$. Therefore I will only focus building $G_1$.

A word in $w=b^ia^j \in L_1$ can be written as $w = b^k \alpha a^k$, where $k = \min {i,j }$ and $\alpha$ is a non-empty string containing only $a$s or only $b$s. Conversely, all words of the form $b^k \alpha a^k$ belong to $L_1$. Then, this is a valid grammar $G_1$:

$$ \begin{align} S &\to bSa \mid A \mid B \ A & \to aA \mid a \ B & \to bB \mid b \end{align} $$

Engelberg commented 2 years ago

I'm not familiar with the - operator you're showing here, but it sort of looks like this is happening at the token level, trying to define all the characters that are not escape characters, which you can probably do with a regular expression. Negative lookahead might also be relevant, but should be used sparingly.

GlassGhost commented 2 years ago

Ok, I looked it up and according to https://www.cs.unc.edu/~plaisted/comp455/slides/cfl3.5.pdf

You cannot have exclusion in a cfg, and if you do it is no longer a cfg.

Which can make things a lot slower and more complicated or just plain old not work at all, which is why I think it wasn't included in abnf/ebnf

GlassGhost commented 2 years ago

This is correct right?

https://github.com/Engelberg/instaparse/issues/116

is negative lookahead a implementation of exclusion or a not operator?? is it still context free?

Engelberg commented 2 years ago

I am unfamiliar with the term "exclusion operator", so I can't answer that question directly, but yes, the whole point of negative lookahead is to scan ahead and only permit strings that do NOT satisfy some grammar rule starting from that point. This allows you to implement certain things that are not context free.