Closed benrg closed 1 year ago
In all four of your examples, under-the-hood natsort
is treating the values as the same:
"1"
-> 1
'01"
-> 1
float("nan")
-> float("-inf")
None
-> float("-inf")
So, these are all demonstrating that python sorted
uses a stable sort algorithm. If you want to ensure consistent ordering you could sort your input without natsort first, then sort with, and you should always get consistent results.
In the case of NaNs, there's a flag to sort them at the end instead of the beginning, but they don't sort at the end with it and don't sort at the beginning without it, even if the input is only floats.
Can you demonstrate that the flag doesn't change behavior?
The library already raises TypeError when sorting [1, b'1'], so it's not like it's aiming to work with arbitrary mixtures of heterogeneous values.
This is not true. Please check out https://github.com/SethMMorton/natsort#sorting-mixed-types. The fact that bytes are not handled as part of this is explicitly called out in https://github.com/SethMMorton/natsort#handling-bytes.
People who pass the FLOAT flag are working around some horrible brokenness elsewhere in their system, and deserve all the bad things that happen to them, but I have a hard time believing that anyone needs/wants the current int-only behavior without the flag.
I am not sure what you are trying to say here, but ideally it would have been said without wishing ill on people who use this library in a way you disagree with. There is a "be nice" policy here, I would appreciate if you would follow that guideline.
I can think of easy ways to solve the NaN/None
issue and would be open to fixing that, but the "1"/"01" issue is much more challenging. There is no easy way to solve this without using heuristics, and even then they would likely be fragile and slow down the sorting algorithm. Did you have a specific suggestion of how to solve the problem?
In the next natsort
release, the handling of None
and NaN will be done automatically under-the-hood to always be consistently sorted. For the numbers case, this can be solved by adding ns.PRESORT
, which pre-sorts your data (as suggested above).
The output of
natsorted
depends on the input order in many cases:In the case of NaNs, there's a flag to sort them at the end instead of the beginning, but they don't sort at the end with it and don't sort at the beginning without it, even if the input is only floats. Code that depends on NaN sorting order is likely to check the 0th or -1th item of the output to see whether there are any NaNs, which leads to hard-to-find bugs with the current behavior.
In the case of '1' and '01', I don't think the order matters, but I think there should be an order.
StrCmpLogicalW
sorts01
<1
<02
<2
; I don't claim that's the best order, but it's consistent and matches at least one widely used tool.In the case of 1 and '1', my preference would be that it raise TypeError. People who pass the
FLOAT
flag are working around some horrible brokenness elsewhere in their system, and deserve all the bad things that happen to them, but I have a hard time believing that anyone needs/wants the current int-only behavior without the flag. The library already raises TypeError when sorting[1, b'1']
, so it's not like it's aiming to work with arbitrary mixtures of heterogeneous values.