Zac-HD / shed

`shed` canonicalises Python code. Shed your legacy, stop bikeshedding, and move on. Black++
https://pypi.org/project/shed/
GNU Affero General Public License v3.0
343 stars 23 forks source link

shed --refactor breaks on (incorrect) code involving generator expression #90

Closed DRMacIver closed 1 year ago

DRMacIver commented 1 year ago

I'm reporting this as a bug because shed asked me to, but I think the only actual bug is that shed is giving the wrong error message in thinking that it's a bug:

If I run shed --refactor on the following code (simplified from real example):

def test_group():
    groups = [(-1, [1, 2, 3])]
    assert sorted(numbers) == sorted(sum(numbers for _, numbers in groups, []))

I get the following error:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 3:1.
parser error: error at 3:75: expected one of !=, %, &, (, ), *, **, +, -, ., /, //, <, <<, <=, ==, >, >=, >>, @, ASYNC, [, ^, and, for, if, in, is, not, or, |

    assert sorted(numbers) == sorted(sum(numbers for _, numbers in groups, []))
^
    Please report this to https://github.com/Zac-HD/shed/issues

But in fact this code shouldn't have parsed in the first place (this is possibly libcst being more lenient than the standard parser?), because the generator in the sum should have been wrapped in parentheses, and without that it is invalid Python code.

This is tested on shed 2023.4.1

Zac-HD commented 1 year ago

Hmm, "wrong error message" on invalid syntax is definitely a shed bug, but I suspect there's more to this. Is there a case where libcst outputs this code, so we could report it as an upstream bug too?

DRMacIver commented 1 year ago

Yeah, libcst will happily parse this code, take minor modifications, and spit it back out. e.g. this works as you'd expect:

import libcst
import libcst.matchers as m

mod = libcst.parse_module("""
def test_group():
    groups = [(-1, [1, 2, 3])]
    assert sorted(numbers) == sorted(sum(numbers for _, numbers in groups, []))
""")

mod = m.replace(mod, m.Name(value='test_group'), libcst.Name(value='some_test'))

print(mod.code)

This will as you'd expect give:

def some_test():
    groups = [(-1, [1, 2, 3])]
    assert sorted(numbers) == sorted(sum(numbers for _, numbers in groups, []))

And yeah, I'd agree this is probably a libcst bug, although in general libcst doesn't seem to make any guarantees that the code it produces will be syntactically valid if you screwed something up. e.g. shed pretty regularly ends up in situations where libcst produces bad code because shed had a bug where it did a refactor wrong.

Zac-HD commented 1 year ago

Per https://github.com/Instagram/LibCST/issues/287#issuecomment-618311246 they do want invalid code to be unrepresentable, but they're not exactly keeping up with my bug reports so I tend to file an upstream issue and then add a workaround here 🤷‍♂️

Zac-HD commented 1 year ago

Fixed in 23.5.1 / https://github.com/Zac-HD/shed/commit/baf7705712f87b3c8856bb912204d8a64898e590 by... skipping the libcst logic when it (correctly) rejects invalid code. Potentially a bit confusing if you expected refactorings to happen, but all the other options I could think of seemed worse 😅

See also #93.