Kotlin / kotlin-spec

Kotlin Language Specification:
https://kotlinlang.org/spec
Apache License 2.0
393 stars 80 forks source link

NOT_IS token never generated #78

Closed yperess closed 3 years ago

yperess commented 3 years ago

I believe that the NOT_IS token is never generated due to EXCL_WS and EXCL_NO_WS being declared above it.

In my test, a simple test written in Kotlin:

val lexer = KotlinLexer(CharStreams.fromString("!is"))
val tokenStream = CommonTokenStream(lexer).apply { fill() }
val tokens = tokenStream.tokens
assertThat(tokens).hasSize(2) // This fails
assertThat(tokens[0].type).isEqualTo(KotlinLexer.NOT_IS)
assertThat(tokens[1].type).isEqualTo(KotlinLexer.EOF)

What I actually get is 3 tokens, !, is, and <EOF>.

belyaev-mikhail commented 3 years ago

Good catch! I'll recheck with our test base

belyaev-mikhail commented 3 years ago

Ok, let's clarify this. If you look at definition for NOT_IS, you may find that it actually requires a space or newline after it:

NOT_IS: '!is' (Hidden | NL);

So your example is invalid, as there is no space or newline after the token, so lexer resorts to the other valid sequence of ! and is. The idea of requiring hidden symbol after !is is not to generate NOT_IS for sequences like val x = !isTrue. If we add EOF as another hidden symbol option here, your example would work as intended, but there is no real profit for grammar here: ending a Kotlin file with !is is never correct.

belyaev-mikhail commented 3 years ago

Ordering of tokens does not matter in this particular example as it only matters when two single tokens match the same input of the same length, otherwise lexing is greedy as it should be and produces the longest tokens possible, which, in this case, will always be !is.