Closed fekad closed 4 years ago
As I recall from my work on the grammar, Grammatica needs it to be LL-parsable. That means that when there are two possibilities (e.g. A | B), the next upcoming token must determine which branch the parser should take. That makes the production rules look a bit roundabout awkward compared to many example grammars seen online. On the other hand, it means we keep the grammar in one of the easiest classes to parse.
Have you checked that these changes remain LL-parsable? I.e., have you run them through Grammatica?
@rartino Thank for the comment! I'm not a grammar expert but seems to work (https://github.com/fekad/OPTiMaDe/tree/opt_comp_expr) and (https://travis-ci.com/fekad/OPTiMaDe/builds/135855010). Of course, I had to update most of the tests but they look reasonable.
@rartino @sauliusg Is there any chance that you had some time to look at this because in that case, I can create a new PR for this.
Since we are anyway doing changes to the grammar, I'm fine with the change to use repetition rather than recursion if it works in Grammatica. So, feel free to write it up as a PR.
But, I don't like the 'ExpressionOr', 'ExpressionAnd', 'ExpressonNot' naming because:
Hence, I prefer keeping the Expression, ExpressionClause, ExpressionPhrase names.
To achieve a better performance, repetition instead of recursion can be used for the rule of the comparison expressions in the grammar:
Expression = ExpressionClause, [ OR, Expression ] ; ExpressionClause = ExpressionPhrase, [ AND, ExpressionClause ] ;
VS
Expression = ExpressionClause, { OR, ExpressionClause } ; ExpressionClause = ExpressionPhrase, { AND, ExpressionPhrase } ;
In this case, the parser may group the same comparisons together, so a node can have multiple children in the parsed tree, instead of having a deep tree.
I would say no, let's not change grammar "to achieve a better performance". The provided grammar is not an implementation, it is a specification. This means that it should be readable by someone who has enough mathematical background, expressed in a standard, implementation neutral machine readable form (EBNF), and usable for automated checks/processing.
The current specification, IMHO, fulfils all these requirements, and I am very much against changing it just because some implementation might be more efficient if written one way or another.
This is because different parser generators have different optimality requirements. For example, yacc/bison parser generators produce more efficient parsers when rules are left-recursive like in list: element | list element
, but recursive decent parsers might be better off first processing the element
, and then recursing to the list
(which might allow a tail-recursion optimisation by a compiler). Still, both would produce the same language element element*
. Thus, it is wrong to worry about "efficiency" in the specification, it will not be efficient for all implementations.
I also agree with @rartino that the current terminology is more widely used and more accurate that the suggested new one; there are usually more than one operation on each level of precedence, so it would be counterproductive to name rules after just the currently used operators (otherwise names get misleading if you introduce, say 'XOR' some time later).
This is not to say that your implementation must use the same names or the same grammar. You can use names that are cleared for you in your code, and you can (and probably should) transform the filter grammar into an equivalent one which is more efficient for the parser generator of your choice.
Thanks @rartino @sauliusg for the comments. I agree that the actual implementation may need a slightly different structure of grammar, so I just close this issue without any modification of the specification.
To achieve a better performance, repetition instead of recursion can be used for the rule of the comparison expressions in the grammar:
VS
In this case, the parser may group the same comparisons together, so a node can have multiple children in the parsed tree, instead of having a deep tree.
These modifications could break the existing implementations, so it's worth to discuss here before creating any PR. Of course, alternatively, the parsed query could be transformed/compiled to into the efficient query without modifying the grammar but it needs more coding.
PS: In this case, "ExpressionOr", "ExpressionAnd" and "ExpressionNot" may be a more descriptive name for the rule respectively.