Closed dhruvmanila closed 3 months ago
Would it help if the lexer remebered the previous token kind? It could then test if the previous kind was a non logical newline and only then do the relexing?
As you've mentioned in your comment in the PR, I don't think so that would be sufficient.
Yeah. I kind of want to avoid that we need to re-implement the entire lexing. Ideally, we could just look at the previous tokens and use that information to make a decision.
One solution would be to update
TokenSource
to pass in the last comment token range to re-lexing method on the lexer which can be used to check if this line continuation is part of the comment or not.
Do you think this is a better idea? I'm unsure about the split logic.
I'm open to pass information from the TokenSource
to the Lexer
. I'm unsure if it should be the comment because that still requires duplicating the line continuation etc logic. Maybe we could just pass the position of the last NonLogicalNewline
where last means, the last before any non-trivia token? But I would need to have a closer look at the implementation again to fully understand what information we need.
Maybe we could just pass the position of the last
NonLogicalNewline
where last means, the last before any non-trivia token?
I literally woke up in the morning thinking about this lol. I think this could work and should simplify the implementation. Let me try it.
Even with the fix in https://github.com/astral-sh/ruff/pull/12035, we still need to consider the fact that the line continuation character could be part of the comment which means that the newline character is not being escaped.
This won't create a panic but the lexer won't be moved back and keep the
NonLogicalNewline
token instead of changing it toNewline
token.For example:
This will emit the following tokens:
One solution would be to update
TokenSource
to pass in the last comment token range to re-lexing method on the lexer which can be used to check if this line continuation is part of the comment or not.This is backwards lexing all over again :)