W391: spurious warnings with python 3.12 beta

hannob commented 1 year ago

It appears there's an incompatibility with pycodestyle and python 3.12.0b1 (latest beta version).

You can check by running this in a docker container, this creates a minimal example showing this warning:

docker run --rm -ti python:3.12-rc /bin/bash
echo -en "import sys\n\nif 1:\n    pass\n" > test.py
pycodestyle test.py

Output is:

test.py:4:1: W391 blank line at end of file

Same with the git version of pycodestyle. This does not happen with python 3.11.

asottile commented 1 year ago

yeah there's a quite annoying regression with the tokenizer in the latests version -- cc @pablogsal

this is going to impact every user of pycodestyle, flake8, etc.

pablogsal commented 1 year ago

yeah there's a quite annoying regression with the tokenizer in the latests version -- cc @pablogsal

this is going to impact every user of pycodestyle, flake8, etc.

If this is related to the fact that the last dedent now is reported on existing lines, this is now documented behaviour so this needs to be fixed in tools. Check https://docs.python.org/3.12/whatsnew/3.12.html#changes-in-the-python-api:

Some final DEDENT tokens are now emitted within the bounds of the input. This means that for a file containing 3 lines, the old version of the tokenizer returned a DEDENT token in line 4 whilst the new version returns the token in line 3.

asottile commented 1 year ago

@pablogsal "needs to be fixed in tools" -- it would be nice if this didn't change without a good reason. the impact to tools is pretty widespread and this is going to be a pain in my ass for years of people not upgrading and misreporting this. same thing happened the last time cpython changed the end-of-file reporting in 3.7 and I still get pings about it

the beta period is a good time to report regressions and ideally they'd be taken seriously

pablogsal commented 1 year ago

Sorry Anthony, I am very sorry to hear that you are upset about this. Let me answer your points.

didn't change without a good reasom

There is a good reason: we are using the C tokenizer now under the hood (the Python tokenizer based on regular expressions is gone now) to support PEP 701. Changing this in the C tokenizer is very very tricky because it is also used by the regular compilation pipeline and we certainly cannot alter that or the parser. We also think this is more correct behavior (but that's not the real reason we changed it). In any case, I don't want to enter a discussion around this here, I am just telling you that this is not a random decision.

the beta period is a good time to report regressions and ideally they'd be taken seriously

Yes, but this is not an unexpected regression: this is documented behaviour now in the "porting to Python 3.12". Please, don't assume we are not taking things seriously because we don't agree.

So unfortunately this needs to be fixed in tools as I mentioned. I am sorry that this is causing you a lot of trouble.

pablogsal commented 1 year ago

Nevertheless, I will discuss this with the team in case we can devise an easy solution, just in case.

pablogsal commented 1 year ago

CC: @lysnikolaou thoughts?

pablogsal commented 1 year ago

@asottile opened https://github.com/python/cpython/issues/104976 to track this

pablogsal commented 1 year ago

Ok we fixed this upstream. Can someone confirm that this solves the problem?

AlexWaygood commented 1 year ago

@pablogsal, the issue with W391 doesn't appear to be fixed :/

Running the flake8-pyi test suite with a fresh build of CPython 3.13 (https://github.com/python/cpython/commit/949f0f5bb07d94f8882135a1d58d82c0a2b289a9) currently results in every test failing due to spurious W391 warnings being emitted from pycodestyle on all of our test data.

(We have instructions on how to run our tests here if you want to try to reproduce -- it's a pretty simple setup.)

pablogsal commented 1 year ago

Then I don't know what's going on because after my fix the tokenizer emits exactly the same output as in 3.11 except for PEP701 related changes. If this is something in our side we would need a reproducer that doesn't use any 3rd party code just using the tokenize module.

pablogsal commented 1 year ago

Check this out:

❯ python --version
Python 3.13.0a0

❯ python -m flake8 ./tests/unused_things.pyi
...
./tests/unused_things.pyi:1:1: Y049 TypedDict "_ConditionallyDefinedUnusedClassBasedTypedDict" is not used
./tests/unused_things.pyi:1:54: F821 undefined name 'TypedDict'
./tests/unused_things.pyi:5:1: Y046 Protocol "_ConditionallyDefinedUnusedProtocol" is not used
./tests/unused_things.pyi:5:43: F821 undefined name 'Protocol'
./tests/unused_things.pyi:7:1: W391 blank line at end of file

So we get the W391 blank line at end of file but now check this out:

❯ /home/pablogsal/.pyenv/shims/python3.11 --version
Python 3.11.1

❯ ../python --version
Python 3.13.0a0

❯ diff <(../python -m tokenize < ./tests/unused_things.pyi ) <(/home/pablogsal/.pyenv/shims/python3.11 -m tokenize < ./tests/unused_things.pyi )

So the output of the tokenizer is the same in that file for 3.11 and 3.12 so not sure where is the problem then.

AlexWaygood commented 1 year ago

@pablogsal I'm seeing pretty different results running python -m tokenize on e.g. del.pyi (our smallest test data file).

On Python 3.11:

``` 0,0-0,0: ENCODING 'utf-8' 1,0-1,29: COMMENT '# flags: --extend-ignore=Y037' 1,29-1,31: NL '\r\n' 2,0-2,4: NAME 'from' 2,5-2,11: NAME 'typing' 2,12-2,18: NAME 'import' 2,19-2,28: NAME 'TypeAlias' 2,28-2,29: OP ',' 2,30-2,35: NAME 'Union' 2,35-2,37: NEWLINE '\r\n' 3,0-3,2: NL '\r\n' 4,0-4,7: NAME 'ManyStr' 4,7-4,8: OP ':' 4,9-4,18: NAME 'TypeAlias' 4,19-4,20: OP '=' 4,21-4,25: NAME 'list' 4,25-4,26: OP '[' 4,26-4,35: NAME 'EitherStr' 4,35-4,36: OP ']' 4,36-4,38: NEWLINE '\r\n' 5,0-5,9: NAME 'EitherStr' 5,9-5,10: OP ':' 5,11-5,20: NAME 'TypeAlias' 5,21-5,22: OP '=' 5,23-5,28: NAME 'Union' 5,28-5,29: OP '[' 5,29-5,32: NAME 'str' 5,32-5,33: OP ',' 5,34-5,39: NAME 'bytes' 5,39-5,40: OP ']' 5,40-5,42: NEWLINE '\r\n' 6,0-6,2: NL '\r\n' 7,0-7,3: NAME 'def' 7,4-7,12: NAME 'function' 7,12-7,13: OP '(' 7,13-7,20: NAME 'accepts' 7,20-7,21: OP ':' 7,22-7,31: NAME 'EitherStr' 7,31-7,32: OP ')' 7,33-7,35: OP '->' 7,36-7,40: NAME 'None' 7,40-7,41: OP ':' 7,42-7,45: OP '...' 7,45-7,47: NEWLINE '\r\n' 8,0-8,3: NAME 'del' 8,4-8,13: NAME 'EitherStr' 8,15-8,43: COMMENT '# private name, not exported' 8,43-8,45: NEWLINE '\r\n' 9,0-9,0: ENDMARKER '' ```

On Python 3.13a0:

``` 0,0-0,0: ENCODING 'utf-8' 1,0-1,29: COMMENT '# flags: --extend-ignore=Y037' 1,29-1,30: NL '\n' 2,0-2,4: NAME 'from' 2,5-2,11: NAME 'typing' 2,12-2,18: NAME 'import' 2,19-2,28: NAME 'TypeAlias' 2,28-2,29: OP ',' 2,30-2,35: NAME 'Union' 2,35-2,36: NEWLINE '\n' 3,0-3,1: NL '\n' 4,0-4,7: NAME 'ManyStr' 4,7-4,8: OP ':' 4,9-4,18: NAME 'TypeAlias' 4,19-4,20: OP '=' 4,21-4,25: NAME 'list' 4,25-4,26: OP '[' 4,26-4,35: NAME 'EitherStr' 4,35-4,36: OP ']' 4,36-4,37: NEWLINE '\n' 5,0-5,9: NAME 'EitherStr' 5,9-5,10: OP ':' 5,11-5,20: NAME 'TypeAlias' 5,21-5,22: OP '=' 5,23-5,28: NAME 'Union' 5,28-5,29: OP '[' 5,29-5,32: NAME 'str' 5,32-5,33: OP ',' 5,34-5,39: NAME 'bytes' 5,39-5,40: OP ']' 5,40-5,41: NEWLINE '\n' 6,0-6,1: NL '\n' 7,0-7,3: NAME 'def' 7,4-7,12: NAME 'function' 7,12-7,13: OP '(' 7,13-7,20: NAME 'accepts' 7,20-7,21: OP ':' 7,22-7,31: NAME 'EitherStr' 7,31-7,32: OP ')' 7,33-7,35: OP '->' 7,36-7,40: NAME 'None' 7,40-7,41: OP ':' 7,42-7,45: OP '...' 7,45-7,46: NEWLINE '\n' 8,0-8,3: NAME 'del' 8,4-8,13: NAME 'EitherStr' 8,15-8,43: COMMENT '# private name, not exported' 8,43-8,44: NEWLINE '\n' 9,0-9,1: NL '\n' 10,0-10,0: ENDMARKER '' ```

So that's a lot of NEWLINE tokens where the value was \r\n on Python 3.11 but is now \n on Python 3.12/3.13.

Possibly relevant: I'm running my tests on a Windows machine!

pablogsal commented 1 year ago

Possibly relevant: I'm running my tests on a Windows machine!

Oh, that may be making all the difference!

pablogsal commented 1 year ago

Well, then if there is anything wrong here in our side (which still is unclear) this may be something harder to fix as this is deep into the C tokenizer and certainly, we probably cannot change it because everything else relies on this :(

I may try to give this a go, but meanwhile maybe @lysnikolaou has some spare cycles.

pablogsal commented 1 year ago

This is what I get in Linux for 3.11:

/home/pablogsal/.pyenv/shims/python3.11 -m tokenize < ./tests/del.pyi
1,0-1,29:           COMMENT        '# flags: --extend-ignore=Y037'
1,29-1,30:          NL             '\n'
2,0-2,4:            NAME           'from'
2,5-2,11:           NAME           'typing'
2,12-2,18:          NAME           'import'
2,19-2,28:          NAME           'TypeAlias'
2,28-2,29:          OP             ','
2,30-2,35:          NAME           'Union'
2,35-2,36:          NEWLINE        '\n'
3,0-3,1:            NL             '\n'
4,0-4,7:            NAME           'ManyStr'
4,7-4,8:            OP             ':'
4,9-4,18:           NAME           'TypeAlias'
4,19-4,20:          OP             '='
4,21-4,25:          NAME           'list'
4,25-4,26:          OP             '['
4,26-4,35:          NAME           'EitherStr'
4,35-4,36:          OP             ']'
4,36-4,37:          NEWLINE        '\n'
5,0-5,9:            NAME           'EitherStr'
5,9-5,10:           OP             ':'
5,11-5,20:          NAME           'TypeAlias'
5,21-5,22:          OP             '='
5,23-5,28:          NAME           'Union'
5,28-5,29:          OP             '['
5,29-5,32:          NAME           'str'
5,32-5,33:          OP             ','
5,34-5,39:          NAME           'bytes'
5,39-5,40:          OP             ']'
5,40-5,41:          NEWLINE        '\n'
6,0-6,1:            NL             '\n'
7,0-7,3:            NAME           'def'
7,4-7,12:           NAME           'function'
7,12-7,13:          OP             '('
7,13-7,20:          NAME           'accepts'
7,20-7,21:          OP             ':'
7,22-7,31:          NAME           'EitherStr'
7,31-7,32:          OP             ')'
7,33-7,35:          OP             '->'
7,36-7,40:          NAME           'None'
7,40-7,41:          OP             ':'
7,42-7,45:          OP             '...'
7,45-7,46:          NEWLINE        '\n'
8,0-8,3:            NAME           'del'
8,4-8,13:           NAME           'EitherStr'
8,15-8,43:          COMMENT        '# private name, not exported'
8,43-8,44:          NEWLINE        '\n'
9,0-9,0:            ENDMARKER      ''

pablogsal commented 1 year ago

Which is I think the same as you are getting for 3.12 in Windows. I would argue that the old windows version may be wrong because is emitting:

8,15-8,43:          COMMENT        '# private name, not exported'
8,43-8,44:          NEWLINE        '\n'
9,0-9,1:            NL             '\n'
10,0-10,0:          ENDMARKER      ''

but this file has 8 lines! So that NEWLINE + NL is incorrect.

AlexWaygood commented 1 year ago

(FYI I just edited my comment above -- I managed to get the 3.11 and 3.13 outputs switched around 🤦‍♂️.)

To clarify: on 3.11, the NEWLINE tokesn have \r\n values; on 3.13, they all have \n values.

So yes, looks like the output on Windows py312 now matches the output on Linux py311. Which is different to what the output was on Windows py311.

pablogsal commented 1 year ago

ooks like the output on Windows py312 now matches the output on Linux py311

Not really, no? We get some weird end NL:

Well, I bet the problem is this:

8,15-8,43:          COMMENT        '# private name, not exported'
8,43-8,44:          NEWLINE        '\n'
9,0-9,1:            NL             '\n'
10,0-10,0:          ENDMARKER      ''

while in Linux we get only

8,15-8,43:          COMMENT        '# private name, not exported'
8,43-8,44:          NEWLINE        '\n'
9,0-9,0:            ENDMARKER      ''

pablogsal commented 1 year ago

Can you open a bug against CPython with this?

AlexWaygood commented 1 year ago

Not really, no? We get some weird end NL:

Right, sorry, I should have checked more thoroughly; I just glanced at it. Thanks for the correction.

Can you open a bug against CPython with this?

Sure.

(By the way, if you hadn't noticed by now, I know next to nothing about the tokenizer! flake8-pyi works entirely by analyzing the AST rather than using the tokenize module. Thanks for your patience here.)

AlexWaygood commented 1 year ago

I've opened https://github.com/python/cpython/issues/105017.

jayaddison commented 1 year ago

If this is related to the DEDENT tokens (and some of the code and comments within pycodestyle suggest to me that it could be), then the cause seems likely to be more-or-less the same as found in sphinx-doc/sphinx#11436. I was a bit verbose and messy there (typical) but did eventually figure out what was going on and left some notes.

In that case, a Python-version-agnostic fix was possible (see sphinx-doc/sphinx#11440), although that has been closed following the change-of-plan to retain the previous end-of-file dedent line numbering behaviour.

pablogsal commented 1 year ago

The thing is that now we emit exactly the same tokens as in 3.11 in exactly the same situations when valid code is provided so I don't understand where the problem may be happening now as I can also see the failures on Linux

jayaddison commented 1 year ago

Ok, yep, that's a puzzler. Does the tokenize output re-arrange the order of the tokens before they're emitted to stdout?

The reason I ask: the smallest standalone test case I've found so far is this tests/comparisons.pyi file from flake8-pyi.

While narrowing in on that, I found that running the pycodestyle module directly on it not only emits W391 using a py3.12-dev (3.12.0-beta.1), but also evaluates the rules in different order:

py311 pycodestyle tests/comparisons.pyi

tests/comparisons.pyi:3:34: E701 multiple statements on one line (colon)
tests/comparisons.pyi:3:80: E501 line too long (123 > 79 characters)
tests/comparisons.pyi:4:27: E701 multiple statements on one line (colon)
tests/comparisons.pyi:4:80: E501 line too long (116 > 79 characters)
tests/comparisons.pyi:5:26: E701 multiple statements on one line (colon)
tests/comparisons.pyi:5:80: E501 line too long (115 > 79 characters)
tests/comparisons.pyi:6:21: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:80: E501 line too long (110 > 79 characters)

py312 pycodestyle tests/comparisons.pyi

tests/comparisons.pyi:3:34: E701 multiple statements on one line (colon)
tests/comparisons.pyi:4:27: E701 multiple statements on one line (colon)
tests/comparisons.pyi:5:26: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:1: W391 blank line at end of file
tests/comparisons.pyi:6:21: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:80: E501 line too long (110 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (115 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (116 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (123 > 79 characters)

(in particular it seems odd that the E501 and E701 entries are no longer interleaved but instead all grouped together when running with py312)

Certainly not proof of anything yet, but it could be a clue; identical output from python -m tokenize doesn't guarantee that pycodestyle follows the same path.

pablogsal commented 1 year ago

identical output from python -m tokenize doesn't guarantee that pycodestyle follows the same path.

But that is basically what we give out so if the output is identical then the bug cannot be in the tokenizer.

jayaddison commented 1 year ago

identical output from python -m tokenize doesn't guarantee that pycodestyle follows the same path.

But that is basically what we give out so if the output is identical then the bug cannot be in the tokenizer.

Possibly? I see two (in fact, three) scenarios:

python -m tokenize displays slightly adjusted information to the output of tokenize.generate_tokens (as called from here) -- so there could be a bug in the tokenize module

OR

pycodestyle is overly-reliant on not only the position and content of the tokens, but also the order in which it reads them (in particular, I would look around here where prev_physical is referenced)

OR

Bonus third option: both of the above could be true

Edit: update after Pablo found the true cause: these theories were off-base, although the second theory might seem similar to the true reason. Basically the tokens and their order in the stream was unchanged, but in Py3.11, the stream was produced from a chunked input, one line at a time, whereas in Py3.12, the stream was produced across the entire input. This caused some line-related logic in the pycodestyle parser -- particularly around empty-final-line -- to be invalidated.

AlexWaygood commented 1 year ago

(The new W391 errors definitely bisect to https://github.com/python/cpython/commit/6715f91edcf6f379f666e18f57b8a0dcb724bf79, so we know something changed in that commit to cause the new errors.)

pablogsal commented 1 year ago

(The new W391 errors definitely bisect to https://github.com/python/cpython/commit/6715f91edcf6f379f666e18f57b8a0dcb724bf79, so we know something changed in that commit to cause the new errors.)

Yeah that I can believe but I don't understand what can it be if the tokens themselves are the same and come out in the same order. I mean these files where we get the errors are quite simple and we can analyse them easily so is not like there is unknown constructs that cause failures.

AlexWaygood commented 1 year ago

I'm currently seeing how much of pycodestyle I can delete locally while still demonstrating a behaviour change between py311 and CPython main

jayaddison commented 1 year ago

(The new W391 errors definitely bisect to python/cpython@6715f91, so we know something changed in that commit to cause the new errors.)

Yeah that I can believe but I don't understand what can it be if the tokens themselves are the same and come out in the same order. I mean these files where we get the errors are quite simple and we can analyse them easily so is not like there is unknown constructs that cause failures.

Although the python -m tokenize output is the same, I think that there could be some difference in the ordering. Here's a .pdbrc file that shows a different result when calling tokenize.generate_tokens during python -m pdb -m pycodestyle tests/comparisons.pyi:

break tokenize.generate_tokens
continue
return
print(list(__return__))
exit

The results of that are different between Py3.11 and Py3.12-dev. (for each Python version, they're the same on both Linux and Windows).

5113c06c012bd3cbdadf3894e3f3bea1c55e14773e2becba2738ac907fbed4bf  py311-lin
30b4e357a70d952c2cf204a70a227f07f292b889eb83df32d579e622ca7d82b4  py312-lin
5113c06c012bd3cbdadf3894e3f3bea1c55e14773e2becba2738ac907fbed4bf  py311-win
30b4e357a70d952c2cf204a70a227f07f292b889eb83df32d579e622ca7d82b4  py312-win

pablogsal commented 1 year ago

Although the python -m tokenize output is the same, I think that there could be some difference in the ordering.

That's not possible or at least I cannot see how that can be possible. The output of python -m tokenize is shown in the order the tokens are emitted and I have been comparing the output with 3.11 using diff

jayaddison commented 1 year ago

Nope, I should have checked more carefully: the ordering seems the same, but the OP token type value has changed (from 54 to 55), the COMMENT token type has changed (from 61 to 64) and the NL token type has changed (from 62 to 65).

jayaddison commented 1 year ago

The token-value changes seems like they should be safe since pycodestyle.py only performs comparisons against label names imported from the tokenize module (tokenize.COMMENT, for example). Even so, I'm trying to think of any way that those could have caused a change in pycodestyle's behaviour.

pablogsal commented 1 year ago

the ordering seems the same, but the OP token type value has changed (from 54 to 55), the COMMENT token type has changed (from 61 to 64) and the NL token type has changed (from 62 to 65).

That's expected: you should use the constants in the tokenize module, not the numerical values themselves

pablogsal commented 1 year ago

Ok I know what's going on after investigating the situation. The problem is that we now consume the whole buffer before calling the C tokenizer but pycodestylerelies on the incorrect assumption that when a token is emitted, only the buffer so far has been consumed:

https://github.com/PyCQA/pycodestyle/blob/f6500e2170406b664a721b5e3a0f5d66bcb757e1/pycodestyle.py#L1968

This incorrectly changes self.line_number to the end of the line so this check is triggered:

https://github.com/PyCQA/pycodestyle/blob/f6500e2170406b664a721b5e3a0f5d66bcb757e1/pycodestyle.py#L250

for the first NEWLINE token. This is very very unlikely to be fixed upstream because there is no guarantees over how we are going to consume the buffer and also we would need to change deeply the C tokenizer APIs, so I fear is something that needs to be changed in pycodestyle.

AlexWaygood commented 1 year ago

@pablogsal thank you so much for taking the time to look into this!

jayaddison commented 1 year ago

@pablogsal nice find - that also explains why the grouping of the warnings output by pycodestyle changed (processing one line at a time and emitting warnings based on those tokens, compared to processing an entire file and emitting warnings across the entire tokenstream).

This might be pedantic/annoying to mention, but I think it's better that I do and then am corrected than not mention it: the Py3.12-dev documentation does currently indicate that the readline argument for both tokenize.tokenize and tokenize_generate_tokens should be a callable that emits one line at a time. So if that contract is changing, there's a documentation change required there (let me know if I should file an issue tracker item for that).

pablogsal commented 1 year ago

So if that contract is changing

Is not changing because it doesn't say how they are consumed. The contract is: you need to pass down a callable that emits one line at a time (at a time meaning every time is called) but note that it doesn't say that we will consume it one line at a time every time we emit a new token. There is absolutely no mention over how or when that callable will be called internally because that's an implementation detail. The only specification is that every time is called you need to give us a new line of input. So the contract is not changing so we don't need to change the documentation

jayaddison commented 1 year ago

Ok, thank you - I had a sense that I might have misunderstood, and that makes things clearer :+1:

(my understanding: the file-producer-side must emit a single line of code per read-call, but cannot assume that the tokenizer has completed emitting tokens from previous lines)

pablogsal commented 1 year ago

(my understanding: the file-producer-side must emit a single line of code per read-call, but cannot assume that the tokenizer has completed emitting tokens from previous lines)

Correct. I understand that the assumption is very natural because it makes it more efficient but note that that was never in the contract.

pablogsal commented 1 year ago

Can someone try PycodeStyle with https://github.com/python/cpython/pull/105070 ?

AlexWaygood commented 1 year ago

Can someone try PycodeStyle with python/cpython#105070 ?

Very happy to confirm that the flake8-pyi test fully suite passes with https://github.com/python/cpython/pull/105070! (Aka, all the pycodestyle errors I was seeing in our test suite are gone!)

SylvainDe commented 1 year ago

As I was about to report the issue, I'm happy to see that it is already reported, analyzed and fixed. Congratulations to everyone involved!

What I do not quite understand though is why this did not get caught by the CI earlier. Would it be worth adding a:

  schedule:
    - cron: '0 0 * * *'  # every day at midnight

in .github/workflows/main.yml ?

asottile commented 1 year ago

we run a weekly build which is already enough CO2

SylvainDe commented 1 year ago

Thanks for the information @asottile , I did not find this in the yml file.

pablogsal commented 1 year ago

Ok, I think we can close this now after I merged https://github.com/python/cpython/pull/105070

Someone somewhere owes me a metaphorical beer 😉

PyCQA / pycodestyle

W391: spurious warnings with python 3.12 beta #1142