Zac-HD / shed

`shed` canonicalises Python code. Shed your legacy, stop bikeshedding, and move on. Black++
https://pypi.org/project/shed/
GNU Affero General Public License v3.0
343 stars 23 forks source link

Replacement of `reversed(sorted(x))` with `sorted(x, reverse=True)` changes stability of sort. #80

Closed DRMacIver closed 1 year ago

DRMacIver commented 1 year ago

Consider for example reversed(sorted(xs, key=lambda x: 0)). The original form will return the reverse of xs, while the refactored form will return xs, as timsort is a stable sort.

Unclear whether this matters, I just spotted it while working on #69 and thought I would raise it as it's a similar category of issue.

Zac-HD commented 1 year ago

Yeah, let's exclude reverse= as well as key=.

DRMacIver commented 1 year ago

Sorry, I think that's a different problem and my mention of key= is distracting. This problem can be seen with a vanilla sort for anything where stability of sort matters, it's just easier to demonstrate those with key based sorts, but e.g. you can see this difference with sorted(xs) where xs = [[], [], [], []] or anything else where identity of the individual elements matters. reversed(sorted(xs)) will reverse that list, but sorted(xs, reverse=True) will leave it in the original order.

Zac-HD commented 1 year ago

I see, yeah... identity-sensitive sorting seems rare enough relative to naive use that I'm inclined to keep this unsafe refactoring until someone reports it causing a problem. At that point maybe we add a separate --unsafe option?

DRMacIver commented 1 year ago

Hmm it's not really about identity though, it's about anything where you care about a stronger level of equality than x <= y and not x < y. e.g. consider the following:

from functools import total_ordering

@total_ordering
class CaseInsensitive:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        if not isinstance(other, CaseInsensitive):
            return NotImplemented
        return self.value == other.value

    def __lt__(self, other):
        if not isinstance(other, CaseInsensitive):
            return NotImplemented
        return self.value.casefold() < other.value.casefold()

    def __repr__(self):
        return repr(self.value)

if __name__ == '__main__':
    xs = list(map(CaseInsensitive, [
        "FOO", "foo"
    ]))

    print(list(reversed(sorted(xs))))
    print(sorted(xs, reverse=True))

This will print:

['foo', 'FOO']
['FOO', 'foo']

In general, sort stability is considered an important feature of the sorting API, and it feels weird to casually violate that just because it usually doesn't matter much, especially given how subtle the bugs where it matters are.