facebook / usort

Safe, minimal import sorting for Python projects.
https://usort.readthedocs.io
MIT License
183 stars 22 forks source link

Examples comparing to isort's heuristics in the documentation seem outdated #186

Open Jackenmen opened 2 years ago

Jackenmen commented 2 years ago

I've tried the examples of isort breaking code that are given here and it doesn't seem like any of those are broken by isort: https://usort.readthedocs.io/en/latest/why.html#comparison-to-isort

Based on my testing, they haven't been broken since isort 5 (released July 2020) for sure so I'm a bit surprised to see those listed in the documentation that was added just 6 months ago. I can understand the benefit of usort being designed with safety as a priority but the way it's compared to isort is currently unfair.

isort doesn't really seem to advertise safety as a priority (though I'm pretty sure it's nonetheless considered important to it since isort 5) so that is still a good selling point (I find the mention of the strictly-parsed tree and how it causes bugs in isort to be particularly convincing), it would be good to not show code examples that isort doesn't actually break though :)

thatch commented 2 years ago

That's fair! The examples on that page don't misbehave on isort 5.x anymore. That page (including some of the bugs) was from our experience running 4.x in a megarepo, and what drove us to start this project rather than trying to piecemeal "fix" isort.

There are still some major differences between our approaches. I'll post some ideas in followup comments, but didn't want to delay confirmation, thanks for reporting.

thatch commented 2 years ago

I did some manual fuzzing of isort to try to come up with better thoughts on how we're different. In addition to our "first, do no harm" principle, I think using more words:

An import sorter should never take valid code and make it invalid

We get most of this safety by using an actual parser (LibCST) that means it's unlikely we will output mangled lines, and we put care into ensuring that there aren't a lot of branches that are hard to test.

# horizontal whitespace is allowed around dots
import a . b . c

# backslash continuations should count as horizontal whitespace too
import a.b\
.c.\
d

# any name of cimport is silently dropped
import cimport

# but aliases of cimport get mangled
import a as cimport

# formfeed is a valid horizontal whitespace, and this gets mangled and the "from a" ends up on its own line
from z import x
from a\x0cimport y

# nbsp is a valid horizontal whitespace (as is full-width space, U+3000)
from mcimportface\xa0import y

# backslash ending a comment isn't a continuation, but this gets mangled
import a # comment \
import b

An import sorter should never take invalid code and make it valid

This is mainly about syntactically-invalid code, and less about import-time side effects which we consider an acceptable risk and I wish didn't exist.

# turns into two lines, "import a" and "import b"
import a from b

# no commas get "corrected"
import foo bar baz

# parens anywhere
from a import )()()(b
from ((a import b))
from a import (((((b)))))
from a import b)))))

# mixed tabs count as 1 space, not 8
if True:
        import z
\t\t\t\t\t\t\t\timport a

# import line gets dropped
from foo import \ # comment
int

# backslashes get removed
from \a\ \import\ \b\\

# some spaces get removed
from a as c import y

# turns into two imports of a
import a as b as from c

# import line gets dropped
from \
a \
import \

int

I think the only case where we silently "fix" invalid code in usort is moving __future__ imports to the top within a block.

An import sorter should make understandable decisions, even if you disagree with them

...and ideally be able to explain them.

# the "c" isn't considered an import, thus counts as a barrier, I think?
import z
import\
c
import a

# these aren't considered an import either?
from.import(b)
from.a import b

Configuration should be minimal, and behave the same universally

It shouldn't matter what version of Python you're using, or what you have on your sys.path, or what OS you're on. This causes us to make some compromises (e.g. what names are considered stdlib, or case-insensitivity) that I imagine spark disagreement though.

The one exception to the environment we have is inference of first-party name under the assumption you're running it from a vcs repo.

We assume the union of all py3 stdlibs currently.

It's okay to leave a file alone, or crash as long as you don't overwrite it, but do so as gracefully as you can.

# infinite loop
import (
<EOF>

# pathlib error
import //
# but this "works"
import //a

# IndexError
import as as as

# IndexError
import a#\
<EOF>

We had one bug this week that raised KeyError, but generally we raise something like SyntaxError pointing to a relevant part of the source.

Conclusion

I recommend we:

  1. Remove the "Unlike isort" paragraph entirely (it doesn't apply to isort 5, and hopefully nobody is using isort 4 anymore)
  2. Move the "µsort operates on a strictly-parsed syntax tree" paragraph up, as it does a pretty good job explaining the structural difference (although "grammar object" feels awkward, maybe rewrite a little)
  3. Make it clearer that we have less configuration, and consider that a good thing (lead-in to the case-insensitive paragraph)
  4. Make it clearer that the four issues listed are fixed, but there are many more like them that still exist (maybe linking to this comment)
  5. Mention our stdlibs are better https://gist.github.com/thatch/c75844447263df10b8f1c94811ad3525 and have better provenance/local reproducibility. (among others, peg_parser is missing on 3.10, and sre is incorrectly force-included on 3.10)
  6. Mention something about cython as being a non-goal (it's not strictly Python, and I doubt LibCST wants to parse it)
ofek commented 1 year ago

Is this expected?

--- a/backend/src/hatchling/builders/sdist.py
+++ b/backend/src/hatchling/builders/sdist.py
@@ -8,7 +8,7 @@
 from copy import copy
 from io import BytesIO
 from time import time as get_current_timestamp
-from typing import TYPE_CHECKING, Any, Callable
+from typing import Any, Callable, TYPE_CHECKING

 from hatchling.builders.config import BuilderConfig
 from hatchling.builders.plugin.interface import BuilderInterface
@@ -19,7 +19,10 @@
     normalize_relative_path,
     replace_file,
 )
-from hatchling.metadata.spec import DEFAULT_METADATA_VERSION, get_core_metadata_constructors
+from hatchling.metadata.spec import (
+    DEFAULT_METADATA_VERSION,
+    get_core_metadata_constructors,
+)
 from hatchling.utils.constants import DEFAULT_BUILD_SCRIPT, DEFAULT_CONFIG_FILE

 if TYPE_CHECKING:
amyreese commented 1 year ago

@ofek yes.

The first change is due to µsort using case-insensitive, lexicographical sorting — the reasoning for this is the second section of the Comparison to isort section of the "Why µsort" page. We believe this is more consistent overall, and more predictable as someone looking for a specific name in the import.

The second change is because µsort attempts to reflow long lines to match black. If you don't use black, but would like to configure the target line length for µsort, you can just set black's line length in your pyproject.toml and µsort will obey that: https://usort.readthedocs.io/en/stable/guide.html#tool-black

ofek commented 1 year ago

Thanks! I do use black so I wonder why the line length isn't read

amyreese commented 1 year ago

I think this is because µsort is only looking for the closest pyproject.toml, which in this case is backend/pyproject.toml, and that does not have a black config. Adding a copy of the [tool.black] section to that file should get the behavior you expect from µsort. I'll open a task to consider how to handle nested config files like that.

ofek commented 1 year ago

Ohhh I see, thank you 🙏