congo-cc / congo-parser-generator

The CongoCC Parser Generator, the Next Generation of JavaCC 21, which in turn was the next generation of JavaCC
https://discuss.congocc.org/
Other
36 stars 11 forks source link

Potential npe on ´matchedType.ordinal()` #23

Closed stbischof closed 1 year ago

stbischof commented 1 year ago

where.

https://github.com/congo-cc/congo-parser-generator/blob/08708399b324504961bb1a55fcee58f31e5a1d2c/src/templates/java/Lexer.java.ftl#L213

why: matchedType is not set

revusky commented 1 year ago

Well, actually, there is no potential NPE there. But I still changed the code a little as a reaction to this, but first I'll tell you why the warning is spurious.

In this condition: (position - start > matchLength || returnedType.ordinal() < matchedType.ordinal())

there is an LHS and an RHS. If the LHS is true, then there is no need to evaluate the RHS, right? Or, in other words, we will only potentially hit the NPE if the LHS is false, i.e. position - start is not greater than matchedLength.

BUT... for matchedType to be null, matchedLength must be zero. (You can look carefully at the code and convince yourself of this. In fact, matchedType being null and matchedLength being zero amount to the same thing!) So, if matchedLength is zero, then the LHS condition must be true, because position - start is at least 1 at this point (it's the number of characters we have read in to match the returnedType candidate. Or, in other words, to go to the RHS, matchedLength must be greater than 0, which in turn means that matchedToken is necessarily non-null.

So, there is no problem with the code, as it is (orwas) but it is understandable why the checker "thinks" there is (or might be...). But finally, I decided to get rid of the warning by making it more clear that matchedToken cannot be null. So, I changed line 184 from:

       TokenType matchedToken = null;

to:

       TokenType matchedToken = TokenType.INVALID;

This, by the way, required a change on line 256, from:

      if (matchedToken == null) {

to:

     if (matchedToken == TokenType.INVALID) {

It actually is all the same really, but the intent of the code is clearer. I would point out, by the way, that the code line you are bringing up is actually very core logic in terms of the tokenization algorithm. If you run through the active NFA states and you have a newer TokenType candidate, that replaces the one you already matched, matchedType, if it matched more characters than the existing candidate, i.e. position - start > matchedLength OR its "ordinal" is lower, i.e. it is the same length but has higher priority. Got it?

So, you see, it's not really so hard to understand finally, right?

This code, on the other hand, is pretty hard to understand! I did come to some understanding of it at some point, but that was so you wouldn't have to! (You should thank me for that... LOL)