congo-cc / congo-parser-generator

The CongoCC Parser Generator, the Next Generation of JavaCC 21, which in turn was the next generation of JavaCC
https://discuss.congocc.org/
Other
33 stars 9 forks source link

Relating to fault-tolerant mode: add optional code block when re-sync occurs and some other things I think need to be done #178

Closed adMartem closed 2 months ago

adMartem commented 4 months ago

More specifically, (Temporarily) fix infinite loop in fault-tolerant recovery and add some comments to highlight need to revisit this area; allow "!" or "!-> {...}" for re-sync points, and fix some fault-tolerant bugs. Pay special attention to line 232 in src/templates/java/ParserProductions.java.ftl. I don't think this is the best fix, but I'm not sure. I have this feeling there is something missing that we really need to recover to the best place in the source, rather than punt here. It's like there is a follow set that should be skipped to somehow in this case.

adMartem commented 4 months ago

This is a grammar that will fall into the infinite loop if the throw on line 232 is not present. When the input is section-name SECTION. sen-tence. paragraph-name. sentence. EOF., the infinite loop occurs if these commits are not applied (namely the one at line 232 to re-throw).

adMartem commented 4 months ago

Funny story, when I did this PR I noticed in the changes that line 219 (${expansion.customErrorRecoveryBlock!}) was one replaced by my new code that allowed recovery code blocks. I thought I had somehow erroneously added that line in an earlier commit and corrected it in this one, but when was it originally added? Well, it turns out it has been there for a long time and pre-dates any of my changes. I guess it was anticipated that a recovery code block was needed in this case, and that was put in the templates and Expansion, but never used as far as I can tell. I think it can go away, since the recoveryBlock is now present. I never noticed it during the ->{...} changes since non-terminal re-sync was the last thing I looked at, and when I did, I thought it was a typo made by me.

adMartem commented 4 months ago

Perhaps the seemingly unused RECOVER TO production syntax is related to this? I think the solution is something in the area of "skip tokens until you get to one that could legally follow this production if it were correctly formed".

adMartem commented 4 months ago

I've found that there are some unexpected (by me) side-effects of fault tolerant parsing in some areas of the COBOL parser where no re-syncing is involved. I haven't investigated these cases yet, but superficially they seem to involve skipping (invalidating) a token which is invalid in the current choice, but is valid (and meaningful) in a different choice. It is possible that the fault-tolerant token handling is just revealing a lookahead deficiency in the grammar, but I don't see how it compiles correctly when not in fault-tolerant mode in that case.

vsajip commented 4 months ago

The failures seem unrelated - I would try rebasing and see if that resolves the test failures.

adMartem commented 3 months ago

745de06 is unrelated to fault tolerence. For some reason I have not figured out it breaks my COBOL parser for about 0.5% of it's tests. My first impression is that the lookahead and actual scanning are somehow out of sync, but I'm still working on isolating what is happening. That is hampered by the fact that I cannot debug it in situ (the parser is too big), so I have to move generated code around so that what I'm interested in is below line 65536. Kind if tedious.

adMartem commented 2 months ago

Closing this draft until I have time to work on it.