Closed kaby76 closed 1 year ago
@kaby76 could you please use <details>
tag for long text? It's not very convenient to scroll the page for observing thread comments.
The latest version of the "find-useless" Bash/Trash script for finding useless parentheses is now in a repo here.
Hi! I have the following thoughts:
BTW, interesting, the text within <details>
tag is not formatted and that why it's unreadable.
UPDATE: got it. @kaby76 please use double line break after opening <details>
tag (I fixed your messages):
<details>
\```
// long code
\```
- Useless parentheses is minor problem. It doesn't look they affect performance (at least they should not). I'd fix them only with other problems but I don't insist and I accept such pull requests.
@KvanTTT It is a minor issue, but it adds to the readability of the grammar. I doubt--as well--that it would affect the performance in any way. But, I haven't compared the generated parser/lexer code to see if there is a difference.
I would prefer to create a commit based on a functional basis ("one PR for fixing all useless parentheses in all grammars") rather than a per-grammar basis ("PR for abb, PR for abnf, ..., PR for z). It's a lot of busy work all around. And, the changes can easily be done in a few minutes with Trash, and verified via one build of a few hours.
The alternative is to add the check to the build and output warnings. So, whenever the grammar is next changed, the check is run automatically.
- I don't think string-based algorithm is suitable for searching useless parentheses. It's better to use algorithms on AST but I understand this approach might be more laborious.
First, this is not at all a string-based algorithm! If anything, you could say this is a parse tree-based approach. Grammars are parsed, a parse tree is produced, and XPath expressions are evaluated to find useless parentheses.
The fact is that operating against the PT works. But, only if the patterns are correct. The patterns are declarative and at the level of the grammar. This is exactly the right way to write these kinds of algorithms.
The main problem with the check is writing XPath expressions to find the useless parentheses. But, this is because the grammar is poorly written! Your observation that this should be as an AST is in the right direction, but unnecessary. (And, after studying compiler construction for 40+ years, I would boldly state that ASTs in compiler implementation have been one of the greatest failures and impediments toward progress. Another data representation that disassociates syntax and semantics adds more complexity, more errors, etc. At the very least, ASTs should always have an inverse mapping back to the PT.)
The problem with the current grammar is that the precedence and associativity of regular expressions in implemented manually using a chain of rules rather than in "Antlr style". If it were implemented in "Antlr style", then the pattern would be far simpler: "If the precedence of the child is higher than the precedence of the parent of the 'block' containing parentheses, then remove the 'block' node of the tree".
As an example of what I mean, consider the arithmetic expression "(1*2)+3". The tree for an "Antlr style" grammar implementation would be:
We know that the parentheses are unnecessary because the precedence of "*" is greater than "+".
That all said, in other systems like Rascal and ASF+SDF, patterns are strings that are converted to parse trees, which makes the pattern easier to write. For now, I've only implemented XPath expressions.
There are useless parentheses spread across many grammars in the repo, which I've been trying to correct. Here is a list as produced by a Trash script (see the following comments).
This is the script used to find useless parentheses. I could not have come up with the various patterns without the valuable help from and keen eyes of @KvanTTT and @msagca.
There is probably a simple pattern based on operator precedence to know when it is safe to remove parentheses. For now, what I wrote will do.
(Updated 5/14/23 8AM EST.)