Zac-HD / shed

`shed` canonicalises Python code. Shed your legacy, stop bikeshedding, and move on. Black++
https://pypi.org/project/shed/
GNU Affero General Public License v3.0
343 stars 23 forks source link

Proposal: shed should use ast.parse to validate code on errors #93

Closed DRMacIver closed 1 year ago

DRMacIver commented 1 year ago

Very often shed will claim to have a bug when given invalid Python that libcst fails to reject. This is at best confusing to the end user.

libcst would ideally never do this, but that is not the world we live in, so I propose the following fix: When shed would show the "please report this as a bug" error, it should instead first parse the original source code with ast.parse and check if it's actually valid Python, and show a different error message if it's not.

Zac-HD commented 1 year ago

We already check that our input is syntactically valid:

https://github.com/Zac-HD/shed/blob/71520a9434eeba77065ab17f954d75f69a09642d/src/shed/__init__.py#L81-L93

...but per #92, the problem is actually libcst failing to parse valid code; we can and should check for that and give a more specific error message in such cases.

(I've also occasionally thought about adding minimization, via e.g. https://pypi.org/project/pysource-minimize/ or at worst https://github.com/Zac-HD/mwe)

DRMacIver commented 1 year ago

If so something is going wrong with this check, because here's another example I ran into:

from collections import defaultdict

VALUES = defaultdict(
    int, (i, i) for i in range(10)
)

This isn't valid Python:

  File "/Users/drmaciver/scratch/shedbug.py", line 4
    int, (i, i) for i in range(10)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized

But the following is the error I get when I call shed --refactor on it:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 3:1.
parser error: error at 3:37: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, <, <<, <=, ==, >, >=, >>, @, [, ^, and, if, in, is, not, or, |

VALUES = defaultdict(int, (i, i) for i in range(10))
^
    Please report this to https://github.com/Zac-HD/shed/issues
DRMacIver commented 1 year ago

Right, I did a bit of investigation. The problem scenario is when we have input code which satisfies the following criteria:

  1. It is not syntactically correct Python
  2. lib2to3 happily parses it
  3. So does libcst with its default parser (actually I'm not sure this one matters! But it is happening, and was misleading me about the underlying problem)
  4. libcst with the native parser correctly rejects it.

The reason why this is firing with shed --refactor turns out to have nothing to do with any changes shed is making to the code, but is because when running codemods shed forces the use of the native parser: https://github.com/Zac-HD/shed/blob/master/src/shed/_codemods.py#L62

So all that happens is that the code gets to the point of trying to parse the code with libcst's native parser, correctly gets a syntax error, and reports a confusing error message.

Zac-HD commented 1 year ago

Fixed in 23.5.1 / https://github.com/Zac-HD/shed/commit/baf7705712f87b3c8856bb912204d8a64898e590 by... skipping the libcst logic when it (correctly) rejects invalid code. Potentially a bit confusing if you expected refactorings to happen, but all the other options I could think of seemed worse 😅