congo-cc / congo-parser-generator

The CongoCC Parser Generator, the Next Generation of JavaCC 21, which in turn was the next generation of JavaCC
https://discuss.congocc.org/
Other
36 stars 11 forks source link

Some fixes to previous merge edge cases; converted some actions in example grammars; identified some issues #70

Closed adMartem closed 10 months ago

adMartem commented 10 months ago

This includes some design decisions that I would like feedback on before committing to the main branch. I decided that the string value assignment (=$ operator) should only include parsed token values and should be normalized to null if the RHS was not parsed, the empty string if it was accepted but was not present, and the trimmed string value of the parsed tokens if it included one or more node children. In testing this, I found that there is an edge case where, when the LHS was immediately preceded by an unparsed token such as a comment, it was included in the toString() value of the nonterminal and was not translated to the native language form. My fix was to change the BaseNode toString() to include only parsed tokens. I'm not sure that is ok for all cases even though all the tests run correctly. I also discovered that there does not seem to be a polyglot-compatible way to access an @Property in an action CodeBlock. Is that the way it should be? Maybe it's just another thing in the category of "maybe we should have a simple polyglot Java-like language that provides a way to do 99% of the things you would want to do in a CodeBlock." In any case, I bypassed the problem in the test cases for python and/or CSharp when it was encountered. Here are the changes that fix the known actual bugs, and hopefully highlight the above issues for consideration.

Also, I'm not really sure we should leave in the abiltiy to use the += operator with node properties on the LHS. Right now it adds the RHS node as a child of the LHS node specified, but then also adds it as a child of the parent node being built as is normal. That leads to it appearing as the child of two potentially different nodes, but with a parent that is the node it is actually included in in both cases. That seems to me to be a magnet for error-prone visitor code, so any thoughts on this would be welcome. The named-child LHS form (/name/ += ...)seems to be less problematic.

So I'm coming to the conclusion that the /name/ form of LHS is very useful, along with the =, =? and =$ form of the node LHS assignment, but that the += form should only apply to the /name/ LHS and that the postfix /name/ and /[name]/ form should be removed, as they are redundant given the uniform LHS forms. Another alternative that is most straightforward would be to simply infer a List<? extends Node> implemented as an ArrayList<>() if @name += ... is used.

adMartem commented 10 months ago

I'm closing this so my commits to this branch don't trigger unnecessary workflow jobs.

vsajip commented 10 months ago

What happens if you use a branch name which doesn't contain the substring "main"?