congo-cc / congo-parser-generator

The CongoCC Parser Generator, the Next Generation of JavaCC 21, which in turn was the next generation of JavaCC
https://discuss.congocc.org/
Other
36 stars 11 forks source link

Activate/deactivate tokens by subclass #52

Open adMartem opened 1 year ago

adMartem commented 1 year ago

Food for thought: ACTIVATE_TOKENS #WholeBunchOfTokens ... or even ACTIVATE_TOKENS #HalfBunch, #OtherHalf ....

adMartem commented 1 year ago

And even matching by token subclass would be nice for some use cases (like preprocessing, for instance), both in lexing and parsing. I.e., <#WholeBunchOfTokens> == ( | ... | ). But (orderless) set-based matching would probably have significant semantic repercussions. Perhaps too many. Something to think about on a rainy day.

This all popped into my mind as I was considering if I could combine the COBOL preprocessor with the subsequent parser in order to effectively eliminate the double lexing that occurs. The hand-coded lexing that is currently used for the preprocessor is much slower than the DFA-based lexing of Congo, to the point that it is now equivalent to the whole parsing process that follows, so a 2x speedup of the total process could possibly be achieved. But I digress.

revusky commented 1 year ago

And even matching by token subclass would be nice for some use cases (like preprocessing, for instance), both in lexing and parsing. I.e., <#WholeBunchOfTokens> == ( | ... | ).

Well, yeah, I've actually thought along similar lines. In fact, I've even had very radical thoughts about whether this whole TokenType thing is really necessary and we couldn't just operate based onthe token's Class. I mean, all this Token.TokenType evolved from Token.kind which was an integer. And that must have come from some implementation in C, like YACC and the lot. Maybe it would be more modern or OOP or whatever, to just match based on the class of the token. I don't honestly think that things like instanceof or tok.getClass() == someClass are terribly expensive on a modern JVM. Of course, I haven't investigated how expensive the equivalent sorts of things might be in other languages, namely Python.

Greetings from Busan.