bwhmather / ssort

Tool for automatically sorting python statements within a module
MIT License
360 stars 10 forks source link

Ignore unresolved requirements #96

Open bwhmather opened 1 year ago

bwhmather commented 1 year ago

Early versions of ssort were very aggressive about regrouping dependencies. This meant that a typo that resulted in dependency resolution failing could cause quite dramatic reordering. To protect against this we added a check to make ssort exit with an error if any requirements could not be resolved.

Since then, we have reduced the scope of ssort to only do a simple topological sort. Typos are no longer likely to result in significant shuffling. In spite of this, we have kept the logic for bailing out when requirements can't be resolved.

The errors this logic results in can be very irritating during development, and tend to block subsequent auto-formatters from running.

I propose that we simply remove it.

github-actions[bot] commented 1 year ago

Pull Request Test Coverage Report for Build 7576666998


Totals Coverage Status
Change from base Build 7576240521: 0.004%
Covered Lines: 1235
Relevant Lines: 1257

💛 - Coveralls
cgahr commented 1 year ago

Tbh I don't completely understand how ssort is supposed to work after removing this code. How are unresolved dependencies handled? Just ignored?

bwhmather commented 1 year ago

How are unresolved dependencies handled? Just ignored?

Just ignored. That's it.

The problem we had before is that we tried to do grouping. If someone made a typo, everything would be shuffled.

For example:

def dep1():
    pass

def dep2():
    pass

def func():
   dep1() 
   deb2()  # typo

Would be rejiggled to


def dep2():
    pass

def dep1():
    pass

def func():
   dep1()
   deb2()  # typo

Fixing the type would shuffle everything back.

Now we only rejiggle if someone makes a typo that breaks a cycle so the check is much less valuable.

jgberry commented 1 year ago

Hmm, I feel like we need to be careful with this because there are tradeoffs. As you mention, the reshuffling of cyclical dependency chains when a typo is introduced is the big downside here. To me, the biggest problem is that even after the typo is fixed, the statements will remain in their reshuffled state, which is ultimately some arbitrary order determined by the typo you made. Is it worth introducing this? Quite frankly, I'm not exactly sure what positives this change buys us... we already do this when a wildcard import is detected in a file, so it doesn't really resolve any issues there. Is this just for the sake of simplifying the codebase and avoiding the maintenance of adding new builtins for each new python version? Because, if so, I'm not sure this change is worthwhile.

jgberry commented 1 year ago

The errors this logic results in can be very irritating during development, and tend to block subsequent auto-formatters from running.

Can we have some examples of these errors? I feel like we should dissect these errors first, see if there are any alternative solutions for them, and then make the call on whether or not this change is worth it.

bwhmather commented 1 year ago

Can we have some examples of these errors?

In my usual workflow I will have ssort and a few other auto-formatters running every save, typically stopping after the first error or when my hand heats the spacebar sufficiently.

Something like this, but usually with the commands wrapped up in a project specific script or makefile:

$ rerun sh -c 'ssort && isort && black'

Ignoring the debate about factoring out steps into functions, if I'm working on a new module I might do something like this:

def top_level_function():
    _step1(
    )

    _step2(
    )

    _step3()

At the moment, my formatting loop will exit immediately with:

ERROR: unresolved dependency '_step1' in example.py: line 6, column 4
ERROR: unresolved dependency '_step2' in example.py: line 9, column 4
ERROR: unresolved dependency '_step3' in example.py: line 13, column 4
1 file was not sortable

Nothing will be changed.

With this change I instead get:

1 file was left unchanged
reformatted example.py

All done! ✨ 🍰 ✨
1 file reformatted.

And the file gets reformatted to:

def top_level_function():
    _step1()

    _step2()

    _step3()

Adding pyflakes to the chain gives:

example.py:2:5: undefined name '_step1'
example.py:4:5: undefined name '_step2'
example.py:6:5: undefined name '_step3'

As such, nothing is lost from the point of view of helping users catch errors.

Obviously, not exiting on first error would also solve my problem.

bwhmather commented 1 year ago

Is this just for the sake of simplifying the codebase and avoiding the maintenance of adding new builtins for each new python version?

Our ballooning builtins list is a factor, but mostly because I think it indicates that we are doing something fishy. If we can safely ignore some requirements that can't be resolved in practice then why not all?

bwhmather commented 1 year ago

Regarding cycles: If the developer adds links in the cycle one at a time then we can already get the unwanted reshuffling. I think the only time when this change would result in an unwanted reshuffle that wouldn't be made before is when a developer attempts to complete all links in a cycle but makes a typo in one of them.

I've run a hacked up version of ssort that flags cycles on all of the projects that I currently have checked out, and the only one it detects is between statement_text_sorted and _statement_text_sorted_class in _ssort.py in this project. Interestingly, the current order is consistent with _statement_text_sorted_class being implemented as a stub that was called from statement_text_sorted before being fleshed out and completing the cycle.

Most of my recursive tree traversals use single dispatch, which tends to magically break these syntactic cycles. I remembered and found one other cycle that was broken up using separate modules and a lot of comments.

cgahr commented 1 year ago

I actually think we need this kind of check for the builtin functions. Consider (terrible) code like this:

class A:
    print = print

def print(s):
    ...

Depending on whether ssort knows that print is a builtin function or not, it should sort the code differently.

In any case, I think we should leave this part of the code as it is (for now) and try improve the sorting such that there is a unique way of sorting functions. having this, I think we can determine much more easily if we need a list of builtins or not.

bwhmather commented 1 year ago

I actually think we need this kind of check for the builtin functions. Consider (terrible) code like this:

class A:
    print = print

def print(s):
    ...

Depending on whether ssort knows that print is a builtin function or not, it should sort the code differently.

Interesting. That is a real problem. I don't think it's necessarily a reason to fail on unresolved requirements, but it would definitely seem to be a reason to keep the list of builtins.

Unfortunately, polyfills make things more complicated. A contrived example:

def function():
    raise ExceptionGroup()

if sys.version_info < (3, 11):
    class ExceptionGroup(Exception):
        ...
bwhmather commented 1 year ago

improve the sorting such that there is a unique way of sorting functions.

Interestingly, I don't think this is actually a solution. Increasing the strictness makes the final outcome more predictable, but leads to a lot of churn during iteration. The unresolved requirements check was not introduced because the sorting was too lenient but because it was too strict.

bwhmather commented 1 year ago

Depending on whether ssort knows that print is a builtin function or not, it should sort the code differently.

Interesting. That is a real problem. I don't think it's necessarily a reason to fail on unresolved requirements, but it would definitely seem to be a reason to keep the list of builtins.

Ok, I'm going to go a little bit further than "would seem to be a reason". You are right and we can't merge as is.

bwhmather commented 1 year ago

Have restored the list of builtins in order to avoid the shadowing problem