Closed hannob closed 1 year ago
yeah there's a quite annoying regression with the tokenizer in the latests version -- cc @pablogsal
this is going to impact every user of pycodestyle, flake8, etc.
yeah there's a quite annoying regression with the tokenizer in the latests version -- cc @pablogsal
this is going to impact every user of pycodestyle, flake8, etc.
If this is related to the fact that the last dedent now is reported on existing lines, this is now documented behaviour so this needs to be fixed in tools. Check https://docs.python.org/3.12/whatsnew/3.12.html#changes-in-the-python-api:
Some final DEDENT tokens are now emitted within the bounds of the input. This means that for a file containing 3 lines, the old version of the tokenizer returned a DEDENT token in line 4 whilst the new version returns the token in line 3.
@pablogsal "needs to be fixed in tools" -- it would be nice if this didn't change without a good reason. the impact to tools is pretty widespread and this is going to be a pain in my ass for years of people not upgrading and misreporting this. same thing happened the last time cpython changed the end-of-file reporting in 3.7 and I still get pings about it
the beta period is a good time to report regressions and ideally they'd be taken seriously
Sorry Anthony, I am very sorry to hear that you are upset about this. Let me answer your points.
didn't change without a good reasom
There is a good reason: we are using the C tokenizer now under the hood (the Python tokenizer based on regular expressions is gone now) to support PEP 701. Changing this in the C tokenizer is very very tricky because it is also used by the regular compilation pipeline and we certainly cannot alter that or the parser. We also think this is more correct behavior (but that's not the real reason we changed it). In any case, I don't want to enter a discussion around this here, I am just telling you that this is not a random decision.
the beta period is a good time to report regressions and ideally they'd be taken seriously
Yes, but this is not an unexpected regression: this is documented behaviour now in the "porting to Python 3.12". Please, don't assume we are not taking things seriously because we don't agree.
So unfortunately this needs to be fixed in tools as I mentioned. I am sorry that this is causing you a lot of trouble.
Nevertheless, I will discuss this with the team in case we can devise an easy solution, just in case.
CC: @lysnikolaou thoughts?
@asottile opened https://github.com/python/cpython/issues/104976 to track this
Ok we fixed this upstream. Can someone confirm that this solves the problem?
@pablogsal, the issue with W391 doesn't appear to be fixed :/
Running the flake8-pyi test suite with a fresh build of CPython 3.13 (https://github.com/python/cpython/commit/949f0f5bb07d94f8882135a1d58d82c0a2b289a9) currently results in every test failing due to spurious W391 warnings being emitted from pycodestyle on all of our test data.
(We have instructions on how to run our tests here if you want to try to reproduce -- it's a pretty simple setup.)
Then I don't know what's going on because after my fix the tokenizer emits exactly the same output as in 3.11 except for PEP701 related changes. If this is something in our side we would need a reproducer that doesn't use any 3rd party code just using the tokenize module.
Check this out:
❯ python --version
Python 3.13.0a0
❯ python -m flake8 ./tests/unused_things.pyi
...
./tests/unused_things.pyi:1:1: Y049 TypedDict "_ConditionallyDefinedUnusedClassBasedTypedDict" is not used
./tests/unused_things.pyi:1:54: F821 undefined name 'TypedDict'
./tests/unused_things.pyi:5:1: Y046 Protocol "_ConditionallyDefinedUnusedProtocol" is not used
./tests/unused_things.pyi:5:43: F821 undefined name 'Protocol'
./tests/unused_things.pyi:7:1: W391 blank line at end of file
So we get the W391 blank line at end of file
but now check this out:
❯ /home/pablogsal/.pyenv/shims/python3.11 --version
Python 3.11.1
❯ ../python --version
Python 3.13.0a0
❯ diff <(../python -m tokenize < ./tests/unused_things.pyi ) <(/home/pablogsal/.pyenv/shims/python3.11 -m tokenize < ./tests/unused_things.pyi )
So the output of the tokenizer is the same in that file for 3.11 and 3.12 so not sure where is the problem then.
@pablogsal I'm seeing pretty different results running python -m tokenize
on e.g. del.pyi
(our smallest test data file).
So that's a lot of NEWLINE tokens where the value was \r\n
on Python 3.11 but is now \n
on Python 3.12/3.13.
Possibly relevant: I'm running my tests on a Windows machine!
Possibly relevant: I'm running my tests on a Windows machine!
Oh, that may be making all the difference!
Well, then if there is anything wrong here in our side (which still is unclear) this may be something harder to fix as this is deep into the C tokenizer and certainly, we probably cannot change it because everything else relies on this :(
I may try to give this a go, but meanwhile maybe @lysnikolaou has some spare cycles.
This is what I get in Linux for 3.11:
/home/pablogsal/.pyenv/shims/python3.11 -m tokenize < ./tests/del.pyi
1,0-1,29: COMMENT '# flags: --extend-ignore=Y037'
1,29-1,30: NL '\n'
2,0-2,4: NAME 'from'
2,5-2,11: NAME 'typing'
2,12-2,18: NAME 'import'
2,19-2,28: NAME 'TypeAlias'
2,28-2,29: OP ','
2,30-2,35: NAME 'Union'
2,35-2,36: NEWLINE '\n'
3,0-3,1: NL '\n'
4,0-4,7: NAME 'ManyStr'
4,7-4,8: OP ':'
4,9-4,18: NAME 'TypeAlias'
4,19-4,20: OP '='
4,21-4,25: NAME 'list'
4,25-4,26: OP '['
4,26-4,35: NAME 'EitherStr'
4,35-4,36: OP ']'
4,36-4,37: NEWLINE '\n'
5,0-5,9: NAME 'EitherStr'
5,9-5,10: OP ':'
5,11-5,20: NAME 'TypeAlias'
5,21-5,22: OP '='
5,23-5,28: NAME 'Union'
5,28-5,29: OP '['
5,29-5,32: NAME 'str'
5,32-5,33: OP ','
5,34-5,39: NAME 'bytes'
5,39-5,40: OP ']'
5,40-5,41: NEWLINE '\n'
6,0-6,1: NL '\n'
7,0-7,3: NAME 'def'
7,4-7,12: NAME 'function'
7,12-7,13: OP '('
7,13-7,20: NAME 'accepts'
7,20-7,21: OP ':'
7,22-7,31: NAME 'EitherStr'
7,31-7,32: OP ')'
7,33-7,35: OP '->'
7,36-7,40: NAME 'None'
7,40-7,41: OP ':'
7,42-7,45: OP '...'
7,45-7,46: NEWLINE '\n'
8,0-8,3: NAME 'del'
8,4-8,13: NAME 'EitherStr'
8,15-8,43: COMMENT '# private name, not exported'
8,43-8,44: NEWLINE '\n'
9,0-9,0: ENDMARKER ''
Which is I think the same as you are getting for 3.12 in Windows. I would argue that the old windows version may be wrong because is emitting:
8,15-8,43: COMMENT '# private name, not exported'
8,43-8,44: NEWLINE '\n'
9,0-9,1: NL '\n'
10,0-10,0: ENDMARKER ''
but this file has 8 lines! So that NEWLINE + NL
is incorrect.
(FYI I just edited my comment above -- I managed to get the 3.11 and 3.13 outputs switched around 🤦♂️.)
To clarify: on 3.11, the NEWLINE tokesn have \r\n
values; on 3.13, they all have \n
values.
So yes, looks like the output on Windows py312 now matches the output on Linux py311. Which is different to what the output was on Windows py311.
ooks like the output on Windows py312 now matches the output on Linux py311
Not really, no? We get some weird end NL
:
Well, I bet the problem is this:
8,15-8,43: COMMENT '# private name, not exported'
8,43-8,44: NEWLINE '\n'
9,0-9,1: NL '\n'
10,0-10,0: ENDMARKER ''
while in Linux we get only
8,15-8,43: COMMENT '# private name, not exported'
8,43-8,44: NEWLINE '\n'
9,0-9,0: ENDMARKER ''
Can you open a bug against CPython with this?
Not really, no? We get some weird end
NL
:
Right, sorry, I should have checked more thoroughly; I just glanced at it. Thanks for the correction.
Can you open a bug against CPython with this?
Sure.
(By the way, if you hadn't noticed by now, I know next to nothing about the tokenizer! flake8-pyi works entirely by analyzing the AST rather than using the tokenize
module. Thanks for your patience here.)
I've opened https://github.com/python/cpython/issues/105017.
If this is related to the DEDENT tokens (and some of the code and comments within pycodestyle
suggest to me that it could be), then the cause seems likely to be more-or-less the same as found in sphinx-doc/sphinx#11436. I was a bit verbose and messy there (typical) but did eventually figure out what was going on and left some notes.
In that case, a Python-version-agnostic fix was possible (see sphinx-doc/sphinx#11440), although that has been closed following the change-of-plan to retain the previous end-of-file dedent line numbering behaviour.
The thing is that now we emit exactly the same tokens as in 3.11 in exactly the same situations when valid code is provided so I don't understand where the problem may be happening now as I can also see the failures on Linux
Ok, yep, that's a puzzler. Does the tokenize
output re-arrange the order of the tokens before they're emitted to stdout?
The reason I ask: the smallest standalone test case I've found so far is this tests/comparisons.pyi
file from flake8-pyi
.
While narrowing in on that, I found that running the pycodestyle
module directly on it not only emits W391
using a py3.12-dev (3.12.0-beta.1
), but also evaluates the rules in different order:
py311 pycodestyle tests/comparisons.pyi
tests/comparisons.pyi:3:34: E701 multiple statements on one line (colon)
tests/comparisons.pyi:3:80: E501 line too long (123 > 79 characters)
tests/comparisons.pyi:4:27: E701 multiple statements on one line (colon)
tests/comparisons.pyi:4:80: E501 line too long (116 > 79 characters)
tests/comparisons.pyi:5:26: E701 multiple statements on one line (colon)
tests/comparisons.pyi:5:80: E501 line too long (115 > 79 characters)
tests/comparisons.pyi:6:21: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:80: E501 line too long (110 > 79 characters)
py312 pycodestyle tests/comparisons.pyi
tests/comparisons.pyi:3:34: E701 multiple statements on one line (colon)
tests/comparisons.pyi:4:27: E701 multiple statements on one line (colon)
tests/comparisons.pyi:5:26: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:1: W391 blank line at end of file
tests/comparisons.pyi:6:21: E701 multiple statements on one line (colon)
tests/comparisons.pyi:6:80: E501 line too long (110 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (115 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (116 > 79 characters)
tests/comparisons.pyi:6:80: E501 line too long (123 > 79 characters)
(in particular it seems odd that the E501
and E701
entries are no longer interleaved but instead all grouped together when running with py312)
Certainly not proof of anything yet, but it could be a clue; identical output from python -m tokenize
doesn't guarantee that pycodestyle
follows the same path.
identical output from python -m tokenize doesn't guarantee that pycodestyle follows the same path.
But that is basically what we give out so if the output is identical then the bug cannot be in the tokenizer.
identical output from python -m tokenize doesn't guarantee that pycodestyle follows the same path.
But that is basically what we give out so if the output is identical then the bug cannot be in the tokenizer.
Possibly? I see two (in fact, three) scenarios:
python -m tokenize
displays slightly adjusted information to the output of tokenize.generate_tokens
(as called from here) -- so there could be a bug in the tokenize
moduleOR
pycodestyle
is overly-reliant on not only the position and content of the tokens, but also the order in which it reads them (in particular, I would look around here where prev_physical
is referenced)OR
Edit: update after Pablo found the true cause: these theories were off-base, although the second theory might seem similar to the true reason. Basically the tokens and their order in the stream was unchanged, but in Py3.11, the stream was produced from a chunked input, one line at a time, whereas in Py3.12, the stream was produced across the entire input. This caused some line-related logic in the pycodestyle
parser -- particularly around empty-final-line -- to be invalidated.
(The new W391 errors definitely bisect to https://github.com/python/cpython/commit/6715f91edcf6f379f666e18f57b8a0dcb724bf79, so we know something changed in that commit to cause the new errors.)
(The new W391 errors definitely bisect to https://github.com/python/cpython/commit/6715f91edcf6f379f666e18f57b8a0dcb724bf79, so we know something changed in that commit to cause the new errors.)
Yeah that I can believe but I don't understand what can it be if the tokens themselves are the same and come out in the same order. I mean these files where we get the errors are quite simple and we can analyse them easily so is not like there is unknown constructs that cause failures.
I'm currently seeing how much of pycodestyle I can delete locally while still demonstrating a behaviour change between py311 and CPython main
(The new W391 errors definitely bisect to python/cpython@6715f91, so we know something changed in that commit to cause the new errors.)
Yeah that I can believe but I don't understand what can it be if the tokens themselves are the same and come out in the same order. I mean these files where we get the errors are quite simple and we can analyse them easily so is not like there is unknown constructs that cause failures.
Although the python -m tokenize
output is the same, I think that there could be some difference in the ordering. Here's a .pdbrc
file that shows a different result when calling tokenize.generate_tokens
during python -m pdb -m pycodestyle tests/comparisons.pyi
:
break tokenize.generate_tokens
continue
return
print(list(__return__))
exit
The results of that are different between Py3.11 and Py3.12-dev. (for each Python version, they're the same on both Linux and Windows).
5113c06c012bd3cbdadf3894e3f3bea1c55e14773e2becba2738ac907fbed4bf py311-lin
30b4e357a70d952c2cf204a70a227f07f292b889eb83df32d579e622ca7d82b4 py312-lin
5113c06c012bd3cbdadf3894e3f3bea1c55e14773e2becba2738ac907fbed4bf py311-win
30b4e357a70d952c2cf204a70a227f07f292b889eb83df32d579e622ca7d82b4 py312-win
Although the python -m tokenize output is the same, I think that there could be some difference in the ordering.
That's not possible or at least I cannot see how that can be possible. The output of python -m tokenize is shown in the order the tokens are emitted and I have been comparing the output with 3.11 using diff
Nope, I should have checked more carefully: the ordering seems the same, but the OP
token type
value has changed (from 54
to 55
), the COMMENT
token type has changed (from 61
to 64
) and the NL
token type has changed (from 62
to 65
).
The token-value changes seems like they should be safe since pycodestyle.py
only performs comparisons against label names imported from the tokenize
module (tokenize.COMMENT
, for example). Even so, I'm trying to think of any way that those could have caused a change in pycodestyle
's behaviour.
the ordering seems the same, but the OP token type value has changed (from 54 to 55), the COMMENT token type has changed (from 61 to 64) and the NL token type has changed (from 62 to 65).
That's expected: you should use the constants in the tokenize module, not the numerical values themselves
Ok I know what's going on after investigating the situation. The problem is that we now consume the whole buffer before calling the C tokenizer but pycodestyle
relies on the incorrect assumption that when a token is emitted, only the buffer so far has been consumed:
This incorrectly changes self.line_number
to the end of the line so this check is triggered:
for the first NEWLINE token. This is very very unlikely to be fixed upstream because there is no guarantees over how we are going to consume the buffer and also we would need to change deeply the C tokenizer APIs, so I fear is something that needs to be changed in pycodestyle
.
@pablogsal thank you so much for taking the time to look into this!
@pablogsal nice find - that also explains why the grouping of the warnings output by pycodestyle
changed (processing one line at a time and emitting warnings based on those tokens, compared to processing an entire file and emitting warnings across the entire tokenstream).
This might be pedantic/annoying to mention, but I think it's better that I do and then am corrected than not mention it: the Py3.12-dev documentation does currently indicate that the readline
argument for both tokenize.tokenize
and tokenize_generate_tokens
should be a callable that emits one line at a time. So if that contract is changing, there's a documentation change required there (let me know if I should file an issue tracker item for that).
So if that contract is changing
Is not changing because it doesn't say how they are consumed. The contract is: you need to pass down a callable that emits one line at a time (at a time meaning every time is called) but note that it doesn't say that we will consume it one line at a time every time we emit a new token. There is absolutely no mention over how or when that callable will be called internally because that's an implementation detail. The only specification is that every time is called you need to give us a new line of input. So the contract is not changing so we don't need to change the documentation
Ok, thank you - I had a sense that I might have misunderstood, and that makes things clearer :+1:
(my understanding: the file-producer-side must emit a single line of code per read-call, but cannot assume that the tokenizer has completed emitting tokens from previous lines)
(my understanding: the file-producer-side must emit a single line of code per read-call, but cannot assume that the tokenizer has completed emitting tokens from previous lines)
Correct. I understand that the assumption is very natural because it makes it more efficient but note that that was never in the contract.
Can someone try PycodeStyle
with https://github.com/python/cpython/pull/105070 ?
Can someone try
PycodeStyle
with python/cpython#105070 ?
Very happy to confirm that the flake8-pyi test fully suite passes with https://github.com/python/cpython/pull/105070! (Aka, all the pycodestyle errors I was seeing in our test suite are gone!)
As I was about to report the issue, I'm happy to see that it is already reported, analyzed and fixed. Congratulations to everyone involved!
What I do not quite understand though is why this did not get caught by the CI earlier. Would it be worth adding a:
schedule:
- cron: '0 0 * * *' # every day at midnight
in .github/workflows/main.yml ?
we run a weekly build which is already enough CO2
Thanks for the information @asottile , I did not find this in the yml file.
Ok, I think we can close this now after I merged https://github.com/python/cpython/pull/105070
Someone somewhere owes me a metaphorical beer 😉
It appears there's an incompatibility with pycodestyle and python 3.12.0b1 (latest beta version).
You can check by running this in a docker container, this creates a minimal example showing this warning:
Output is:
Same with the git version of pycodestyle. This does not happen with python 3.11.