Open adMartem opened 1 year ago
And even matching by token subclass would be nice for some use cases (like preprocessing, for instance), both in lexing and parsing. I.e., <#WholeBunchOfTokens> == (
This all popped into my mind as I was considering if I could combine the COBOL preprocessor with the subsequent parser in order to effectively eliminate the double lexing that occurs. The hand-coded lexing that is currently used for the preprocessor is much slower than the DFA-based lexing of Congo, to the point that it is now equivalent to the whole parsing process that follows, so a 2x speedup of the total process could possibly be achieved. But I digress.
And even matching by token subclass would be nice for some use cases (like preprocessing, for instance), both in lexing and parsing. I.e., <#WholeBunchOfTokens> == ( | ... | ).
Well, yeah, I've actually thought along similar lines. In fact, I've even had very radical thoughts about whether this whole TokenType thing is really necessary and we couldn't just operate based onthe token's Class. I mean, all this Token.TokenType
evolved from Token.kind which was an integer. And that must have come from some implementation in C, like YACC and the lot. Maybe it would be more modern or OOP or whatever, to just match based on the class of the token. I don't honestly think that things like instanceof
or tok.getClass() == someClass
are terribly expensive on a modern JVM. Of course, I haven't investigated how expensive the equivalent sorts of things might be in other languages, namely Python.
Greetings from Busan.
Food for thought:
ACTIVATE_TOKENS #WholeBunchOfTokens ...
or evenACTIVATE_TOKENS #HalfBunch, #OtherHalf ...
.