Apparently the internal (JDT) compiler for Eclipse cannot handle some CongoCC-generated parsers that javacc can.

adMartem commented 8 months ago

This is not a critical problem because the actual javac compiler seems to have better handling of static initializers than the bespoke Eclipse compiler. Unfortunately, it is not really possible to replace the incremental JDT compiler with the real one in Eclipse, hence this noting of the issue.

Experimentally I found that simply moving the first and follow set enums to their own class files circumvents the problem. I don't know if it affects performance, but I would be surprised and disappointed if it did. I suspect the newer versions of javac either de-dup the static initializers, or simply special-case the ones for empty sets, and that is not done by the Eclipse compiler.

Anyhow, if we ever feel a need or someone besides me discovers this limitation, there seems to be an easy solution.

BTW, do we use final sets anymore (ever)? I couldn't find any references in the project that seemed to anticipate them other than the macro that generates the code for them.

revusky commented 7 months ago

BTW, do we use final sets anymore (ever)? I couldn't find any references in the project that seemed to anticipate them other than the macro that generates the code for them.

Sorry to be so slow answering. I was just looking over the issue here and realized that this was still unanswered. Well.... if you can't find any usage of this, then I guess there aren't any! The "final set" concept is analogous to "first set". It would be the set of tokens that an expansion can end with, as opposed to begin. So, I guess that concept was part of an earlier pass on fault-tolerant. We scan until we find a token in the "final set". So if our construct is {....}, then the final set would be just the closing brace }. So, one possible error recovery hack is to scan forward until we find (hopefully) that final set token. Something like that... I guess I went for a different approach later and it is not used, but it is still sitting there.

As for the question of Eclipse's internal compiler, well... I dunno... It's tempting to say it's not our problem. After all, our internal Java parser is pretty faithful to the Java language spec. And, since any code we generate has to be parsed by that, it is all probably kosher, additionally evidenced by the fact that javac can always compile it. So, probably the inability of the eclipse compiler to compile code we generate is their bug. Though, that said, if there is a small tweak that would make our code compilable by the eclipse compiler, by all means, I guess...

adMartem commented 7 months ago

Hi. Thanks for taking a look at this. I don't think we should ditch the final set too hastily. There is a recovery mode that is, in effect, what I use ATTEMPT...RECOVER to implement approximately (it was try/catch in the JTB version of the grammar). As I experiment with fault-tolerant mode in CongoCC, I have run into cases that I think would yield better results (for my use case) if I could effectively just eat (invalid tokens) until a token in the final set is found, but I'm not certain at this point that that better functionality could not still be found with the re-scan recovery already present. I've found a couple of what I would consider fault-tolerant mode bugs, but I need to spend some more time on them before knowing for sure they are not a misconception on my part. In particular, one of them is that you can easily get into an infinite loop if the expansion you are looping on starts with the same token as a descendent expansion that is ignored due to a ! marker. In that case, no token can be consumed resulting in a loop on the same first-set token at the higher level. I'm still trying to simplify my test case for this, however, before drawing any conclusions. My intuition is that at least one invalid token should always be produced (consumed) under any condition that produces a ! action.

Regarding this ticket, it isn't that the Eclipse compiler won't parse our output, it is that our output uses too many bytecodes in the Congo-produced parser static initializer method without (I suspect this, but haven't verified it) eliminating the many duplicate EnumSet initializations, at least in my rather large parser. In any case, it is not a big problem for me since I can't really easily use the Eclipse debugger anyway due to the file size, and, except for fault-tolerant dependent debugging, I can always just turn off fault-tolerance and error goes away. [the following is just whinging and can be ignored] The only real solution to the file size problem would be to put scan and check methods in separate class files and that would still leave the productions themselves to move around for debugging so that the ones you need are at the front of the main parser class. Right now, I can do that, but I have to move scans and checks iteratively as I discover during debugging which ones I need to step into.

congo-cc / congo-parser-generator

Apparently the internal (JDT) compiler for Eclipse cannot handle some CongoCC-generated parsers that javacc can. #170