Add experimental option(s) for JTB-compatible tree

adMartem commented 1 year ago

Many changes to ParserProductions template to implement JTB option, make BNFProduction an Expansion, and add "dynamic" injection to add JTB fields to generated nodes.

This is not all that is needed to "finish" this capability, but it is the hard part, I hope. As it stands, I believe it does not affect any current behavior when the option(s) is off. Note there are 2 "X_..." options that both have to be set to achieve the JTB behavior. I suspect this will become one option at some point, but I have a reason to keep them separate at the moment. Also at the moment, the JTB tree requires a set of Productions/INJECTIONs in the (user's) parser.ccc to create the syntactic nodes and provide the equivalent JTB nodes for them to extend, but the end-game is probably to generate them without that requirement.

At this point I'm in the process of validating that the tree produced by this is functionally identical to the JTB/Javacc tree for all of the sources I have at my disposal. It is turning out to be some work because, while I thought I had not made many changed that would affect the tree structure while getting the congo-based grammar correct, it turns out I made more than I remembered. For all of those I have to run my checker which will identify the first node conflict it encounters, and then try and change the Javacc version of the grammar to have the same structure as the new congo version, without running into javacc lookahead failures due to the changed structure. It is labor-intensive.

I did this PR so that you could have a chance to tell me if there is a better way you want me to do something. I have not paid any attention to Python and CSharp (yet?), as is obvious. While they would have no need to the JTB option per se, I am assuming the major changes to the ParserProductions template, for example, should be aligned for future maintenance.

adMartem commented 1 year ago

Well, clearly I have some work to do! I'll have to study the test results and see what I need to do. In the meantime, this is inspection, not merging.

adMartem commented 1 year ago

I think I will withdraw this pull request and resubmit one with 3 commits: 1) structural changes necessary but not sufficient for no. 2 and/or 3, 2) changes to implement an experimental capability that does not produce JTB-compatible trees, but could be generally useful to the Congolese user, and 3) changes to achieve full JTB node functional "conguence". Also, maybe I will be able to do some of the python and csharp testing first on my mac.

revusky commented 1 year ago

Okay, well, I was looking at the PR and was going to write a couple of comments but then I see that you closed this one.

First of all, I don't think that you need the SyntacticElement marker interface. I understand the problem, which is that some things extend Expansion that are not really syntactic elements such as CodeBlock for example. Actually, that a java code block is considered an expansion is something that dates back to legacy JavaCC, and actually, even though Congo is a complete rewrite, there are things like this that are the way they are because they were a result of continuous refactoring forward from the legacy codebase. So, actually, other things also subclass Expansion, such as Assertion and Failure and those things were never even in legacy JavaCC. But really, I think that what instanceof SyntacticElement would be checking, you could perfectly well just use Expansion.getMaximumSize() > 0. As far as I can see (though I could be missing something) something can be considered a syntactic element if it potentially consumes input. Or, in other words, if its maximum size (i.e. the number of tokens is potentially consumes) is zero, then it's really a pseudo-expansion, no?

Now, as for BNFProduction extending Expansion instead of BaseNode, that is probably a good idea. (Though I can't say I've thought about it that much. But it makes sense...) So, again, it may well be that BNFProduction does not extend Expansion simply because that was they way it was in the legacy code. I certainly have no recollection of making any conscious decision on that, and it seems like that having BNFProduction extend Expansion is probably an improvement. Well, the bottom line is that I'm okay on that as long as everything continues to work!

There was another thing I wanted to say (and really, don't take it as a reproach or anything) which is that I think we need to have the policy that if you add features like this, you are incurring the obligation to document them. Though, actually, I'd really like to see some documentation on the whole tree-building machinery -- you know, the fact that there is a node stack and, if you don't specify anything, there is a basic tree-building strategy in place. You can "munge" the node stack with things like pushNode and popNode and peekNode. By default, if there is more than one node on the node stack when you exit a production, those nodes will be rolled up into a single node and in turn placed on the stack.... Well, you know what I mean -- a clear newbie-friendly exposition of how this works. (That's been on my own TODO list forever, but I was just thinking, hey, if I can get you to do it... LOL!)

But finally, is it unreasonable of me to request that you write up documentation on the tree-building machinery as part of this? I don't actually think that a specific document on the tree-building even needs to be very long. (It would eventually be a chapter in the manual, of course.) I've written various things, but, as you surely know, it's scattered all over the place.

But anyway, surely you understand that this pattern of adding more and more features and not really having them documented, this really has to end! It's not about you specifically. I have to be more disciplined and purposeful about this myself!

The situation with the CSharp and Python templates is really not so great and really, I'm not sure what we'll have to do to get the situation under control. The whole polyglot problem is quite challenging. It's far too easy to break the templates and then it's far too hard to figure out where they are broken!

adMartem commented 1 year ago

I wrote a wordy response, but lost it all when I changed tabs on my browser, somehow. So this will be more concise. Maybe.

"Or, in other words, if its maximum size (i.e. the number of tokens is potentially consumes) is zero, then it's really a pseudo-expansion, no?" No, the reason for the marker interface and count method is emulate JTB's logic that generates a NodeSequence node when an expansion is expressly contained in parentheses (easy to recognize as ExpansionWithParentheses) and otherwise when an expansion can match a sequence of more than 1 syntax element. Syntax elements in this context are contained within Congo's ZeroOrOne, ZeroOrMore, OneOrMore, and ExpansionChoice expansions. I too first thought of using the getMaximumSize() method, but then realized that a syntax element (as described) can contain more than one token. So I needed a way to count the preceding items, hence the marker interface and count method.

Regarding documentation, I assumed I was obliged to document any feature I added. I will also attempt to pull together/write an exposé of the node machinery.

congo-cc / congo-parser-generator

Add experimental option(s) for JTB-compatible tree #39