Open homedirectory opened 8 months ago
I am positively impressed by your work.
I'm happy to hear that.
The limitation comes from the restriction on the class of accepted grammars, which must be LL.
Take a look at the accompanied paper, where we presented a general solution for all deterministic context-free languages or, equivalently, all LR grammars. A few years have passed and there's now much more to be said about our algorithm. Especially, we completely missed the connection between our "tree embedding" and real-time pushdown tree automata a la Guessarian 1984. We chose to implement LL grammars because it's much easier. Yamazaki et al. 2019's embedding also supports LR grammars but implemented for LALR grammars.
I know that CFG to LL translation is possible, so point 1 could be achieved.
That's incorrect. LL grammars can express a strict subset of deterministic CFLs which is in turn a strict subset of CFLs.
Have the generated AST resemble the input grammar as closely as possible.
You get $$ variables because we use a naive namer (NaiveNamer) which doesn't make any smart attempts at guessing intermediate variable names. You could specify them explicitly though:
A := x y | w z
=>
A := B | C
B := x y C := w z
That should work I think.
A better namer (Namer) could possibly infer better names automatically.
@OriRoth, I am familiar with the paper - it is through the paper that I found out about your work. However, I must say that I am yet to grasp the tree encoding you described.
Regarding my statement about the translation of CFG to LL. While it is technically incorrect, I rather meant that it is possible to translate a subset of CFLs that are expressible in an LL grammar, such as the one exemplified (predicates and combinations), not all CFLs of course.
You mention the issue with picking appropriate names for intermediate variables. However, that is not the central issue I was trying to raise, so let me be more clear about it. I consider the existence of AST classes corresponding to intermediate variables itself problematic. In the above example, I could explicitly name intermediate variables in my grammar or, as you suggest, provide a smarter Namer, but that will still lead to an obscure AST being generated. Therefore, the focus should not be on names but on the structure of the AST. That is what I tried to convey by providing an example of the desired AST, which I shall repeat below:
interface Condition {}
record AndCondition(Condition left, Condition right) implements Condition {}
record OrCondition(Condition left, Condition right) implements Condition {}
record CompoundCondition(Condition cond) implements Condition {}
record Predicate(...) implements Condition {}
This AST does not contain any intermediate variables. It directly corresponds to the original CFG (also exemplified in the issue description but not repeated in this comment) which is not LL, but the underlying language can be expressed in LL (as it was shown in the issue description), hence the need for grammar transformation. This poses a challenge for parsing and, ultimately, AST generation. The goal is for the AST to have a structure that closely resembles the original grammar; the LL grammar (a result of grammar transformation) would then remain an implementation detail.
Ok, I think I get it now. Quite interesting. Wouldn't supporting a different class of grammars, such as LR or LALR, alleviate the problem? It's possible in theory. It might also be interesting to see how this issue is approached in classical parsers that support LL grammars, such as ANTLR. If they came up with creative solutions, these may be applicable here as well.
I've spent some time familiarising myself with Fling and I am positively impressed by your work. Naturally, I am starting to hit some limitations as I explore further. So the goal of this issue is to discuss one particular limitation. I'd like to know what you think and whether it is possible to overcome this limitation in Fling.
The limitation comes from the restriction on the class of accepted grammars, which must be LL. Here are some challenges that stem from it:
For example, let's consider a context-free grammar for a language that has predicates and means of combining them:
Notably, this grammar is not LL as it involves left recursion. To provide this grammar to Fling, it has first to be transformed:
Now in Fling:
And the resulting AST:
Clearly, such an AST is difficult to reason about because this is the level of LL grammars.
Ideally, an AST corresponding to the original grammar would have the following form:
And the original grammar would be specified as follows:
Overall, it would be great to be able to:
As I already mentioned, I know that CFG to LL translation is possible, so point 1 could be achieved. The requirement outlined in point 2, however, seems vastly more difficult to achieve. It would probably also require changes to the parsing algorithm. A mapping between the original and transformed grammars would need to be maintained and used to construct the desirable AST. I can only guess at possible approaches, so I strongly wish to hear what the authors of Fling think.