congo-cc / congo-parser-generator

The CongoCC Parser Generator, the Next Generation of JavaCC 21, which in turn was the next generation of JavaCC
https://discuss.congocc.org/
Other
33 stars 9 forks source link

Update the scope of polyglot tests to include the FreeMarker grammar. #173

Closed vsajip closed 4 months ago

revusky commented 4 months ago

I have to say that I don't really understand the need to add things like JavaIdentifier19.ccc to this part of the directory tree (or elsewhere really). Can't we just reuse the version of this file that is in examples/java? And actually, these various JavaIdentifierXX.ccc files are in the congocc.jar also. And it's not like these particular files ever really change...

I think I should also say that my intention (though I'm often slow to get around to these things) is to mothball this freemarker grammar. I think that we should replace it with the one actually being used for FreeMarker 3, which is here. Actually, if you would like to replace it, by all means go ahead. Or, come to think of it, we could put it in examples/freemarker3. This FreeMarker grammar here is actually a FreeMarker grammar I wrote back in 2008 and I anticipated that it would eventually replace the existing one. Instead, I forward evolved the one that was being used in the SVN trunk and that is the one in freemarker3. But that's the only one that supports all the new syntax I introduced in the last year, like terse directives and so on. It's funny because the FreeMarker grammar in "Apache FreeMarker" was also written by me, but mostly in the 2002-2004 time range. So, yeah, writing FreeMarker grammars has been a thing for me. But the latest and best one is the freemarker3 one definitely and that actually is kind of a showcase for advanced use of CongoCC.

vsajip commented 4 months ago

Well, I looked at the FreeMarker3 grammar (of course, I looked at that first) and it's a lot more work to include that thanFTL.ccc, because of how much the grammar files reference stuff that is not generated. You'd have to move a large part of the Freemarker3 sources into our examples to get things to even compile, unless major surgery is done on the grammars. I view the FTL grammar as "just another grammar you can test against", and a few niggles in the templates were surfaced because of this testing, so it serves a useful purpose even if it isn't real FreeMarker3 (it fails on some of the files in src/templates, but I didn't look into that).

By all means, we can rationalise the JavaIdentifierDef.ccc stuff in a later PR, but I mainly wanted to check that everything works cross-platform and this was the quickest way of doing that. Currently, each example directory is completely self-contained from the POV of the polyglot tests.

revusky commented 4 months ago

Well, I looked at the FreeMarker3 grammar (of course, I looked at that first) and it's a lot more work to include that thanFTL.ccc, because of how much the grammar files reference stuff that is not generated.

Actually, that's true and is maybe why I didn't replace it yet. I guess I had forgotten about this problem. Hmm...

Well, it occurs to me that another possibility is just to assume that the freemarker3 project is checked out in a parallel directory. (It is on my box!) Of course, if it's not, we could just output a message saying that it was not found so we're not running this test...

Is that such a crazy idea?

Then again, it does occur to me that the various FreeMarker classes are in the congocc.jar (granted, possibly a somewhat older version than what we would be testing against in the former case) so we could just leverage that fact.

And then again, another idea (maybe better) could be to adjust the FreeMarker3 grammar so that it is more useful standalone. That would mean using the preprocessor to condition whether various code injections are to be ignored. That would have the result of making the grammar more messy to look at, but I don't know offhand by how much. And it would also stress the preprocessor functionality all the more, which is, in principle, a good thing.

By all means, we can rationalise the JavaIdentifierDef.ccc stuff in a later PR,

Well, I think that would be better.

vsajip commented 4 months ago

That would mean using the preprocessor to condition whether various code injections are to be ignored

Well, I had to do that for FTL.ccc too, but it was a lot easier than for FM.ccc. I went quite some distance with it before deciding it was too much work just to give the machinery an extra grammar to test against.

As far as I remember, ANTLR uses an approach where the grammar is quite bare and everything semantic is done using visitors on the tree (I might be misremembering), and perhaps that causes its performance issues. However, with CongoCC it's easy to write grammars that are really quite hard to make work across multiple generation languages, and I'd say the current FM3 is in that spot. However, it may be the best compromise given that the resulting solution needs to be performant.

vsajip commented 4 months ago

Anyway, adding the preprocessor and FreeMarker grammars has flushed out some bugs, and I expect each new grammar added will flush out some more until the whole thing gets a bit more mature.

revusky commented 4 months ago

Anyway, adding the preprocessor and FreeMarker grammars has flushed out some bugs, and I expect each new grammar added will flush out some more until the whole thing gets a bit more mature.

Yes, definitely. I think the best testing we can have is full functional tests, like we have with Java, C# and Python. FreeMarker also. Lua... And I think the fact that we use the Java grammar (and soon others) internally and so on is bound to make things even more robust. Well, also the CongoCC grammar itself, since we use the tool itself to build itself, so there is the rebootstrap test, i.e. can we build/test again using the jarfile that we just built? Well, of course, bugs are stubborn critters and will still creep in sometimes, but, you know, I honestly don't think our testing situation is so bad at all. It's better for Java than for the other two languages, but that will gradually change thanks to your efforts. And, in any case, the whole project is so undermanned that we really do have to kill multiple birds with one stone. So, something like the Java grammar stands on its own as an example, it also provides testing, and is also used internally... And this should be the same for C# and Python soon. (I think so!)

As for the FreeMarker3 grammar specifically, there will be continued work on it and even if it's not specifically part of the Congo test suite, it's there and if it were to stop working, that would be another tripwire that would come to my attention at least. And, of course, that you can build the latest freemarker.jar with the latest congocc.jar, then drop it in and everything continues to work, including rebootstrapping, well... that's also a significant stress test of the system.

I think that, when I get around to it, I will add a note to the examples/freemarker that this grammar is something of an historical curiosity and we're just continuing to use it as a test case. If somebody wants to muck with a freemarker grammar, they should use the most advanced one, which is the one in the freemarker3 repository.