RustPython / Parser

MIT License
67 stars 24 forks source link

Always emit non-logical newlines for 'empty' lines #27

Closed charliermarsh closed 1 year ago

charliermarsh commented 1 year ago

Summary

Right now, if you have a comment like:

# foo

The lexer emits a comment, but no newline. It turns out that if the lexer encounters an "empty" line, we skip the newline emission, and a comment counts as an "empty" line (see: eat_indentation, where we eat indentation and comments).

This PR modifies the lexer to emit a NonLogicalNewline in such cases. As a result, we'll now always have either a newline or non-logical newline token at the end of a line (excepting continuations). I believe this is more consistent with CPython. For example, given this snippet:

# Some comment

def foo():
    return 99

CPython outputs:

TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=60 (COMMENT), string='# Some comment', start=(1, 0), end=(1, 14), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(1, 14), end=(1, 15), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(2, 0), end=(2, 1), line='\n')
TokenInfo(type=1 (NAME), string='def', start=(3, 0), end=(3, 3), line='def foo():\n')
TokenInfo(type=1 (NAME), string='foo', start=(3, 4), end=(3, 7), line='def foo():\n')
TokenInfo(type=54 (OP), string='(', start=(3, 7), end=(3, 8), line='def foo():\n')
TokenInfo(type=54 (OP), string=')', start=(3, 8), end=(3, 9), line='def foo():\n')
TokenInfo(type=54 (OP), string=':', start=(3, 9), end=(3, 10), line='def foo():\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 10), end=(3, 11), line='def foo():\n')
TokenInfo(type=5 (INDENT), string='    ', start=(4, 0), end=(4, 4), line='    return 99\n')
TokenInfo(type=1 (NAME), string='return', start=(4, 4), end=(4, 10), line='    return 99\n')
TokenInfo(type=2 (NUMBER), string='99', start=(4, 11), end=(4, 13), line='    return 99\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 13), end=(4, 14), line='    return 99\n')
TokenInfo(type=61 (NL), string='\n', start=(5, 0), end=(5, 1), line='\n')
TokenInfo(type=6 (DEDENT), string='', start=(6, 0), end=(6, 0), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(6, 0), end=(6, 0), line='')

Note the NL tokens after the comment, and for the empty line, along with the NL token at the end prior to the dedent.

charliermarsh commented 1 year ago

\cc @MichaReiser

youknowone commented 1 year ago

Ruff PR changed: https://github.com/charliermarsh/ruff/pull/4438