lexicographic order of `Tuple` and `Vector` inconsistent

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.88k stars 5.49k forks source link

lexicographic order of `Tuple` and `Vector` inconsistent #36010

Open goretkin opened 4 years ago

goretkin commented 4 years ago

julia> isless([1, nothing, 1] , [1, nothing, 1])
false

julia> isless((1, nothing, 1) , (1, nothing, 1))
ERROR: MethodError: no method matching isless(::Nothing, ::Nothing)

jakobnissen commented 4 years ago

There's an argument to be had whether they ought to be consistent.

But when you look at the implementation of isless(::Tuple, ::Tuple), there's another problem. It's defined as

function isless(t1::Tuple, t2::Tuple)
    a, b = t1[1], t2[1]
    isless(a, b) || (isequal(a, b) && isless(tail(t1), tail(t2)))
end

It first checks isless on its elements, then isequal. Whereas isless(::Vector, ::Vector) first checks isequal, then isless. The difference is that isequal is defined for all objects, whereas isless is not, so the Vector implementation is more robust.

Perhaps we could change the Tuple definition to

function isless(t1::Tuple, t2::Tuple)
    a, b = t1[1], t2[1]
    return isequal(a, b) ? isless(tail(t1), tail(t2)) : isless(a, b)
end

It appears to be just as efficient on my computer, and it works correctly by default on all objects.

StefanKarpinski commented 4 years ago

I think the opposite: we should fail here if isless fails on elements so we should do what the tuple version does everywhere. As has recently been discussed, nothing is not an ordered value so this should fail.

goretkin commented 4 years ago

there's another problem. I wasn't sure what you meant by this, but as far as I can tell, they're one and the same (it's not a distinct problem/feature)

I can't think of a reason why the definition between the two should be different. And I also think both behaviors are reasonable.

we should fail here if isless fails on elements so we should do what the tuple version does everywhere. As has recently been discussed, nothing is not an ordered value so this should fail.

I think that would be fine. To counter that, however, here's one definition of lexicographic ordering (from Wikipedia) that is consistent with the current Vector behavior

Given two different sequences of the same length, a1, a2,...ak and b1, b2,... bk, the first one is smaller than the second one for the lexicographical order, if ai < bi (for the order of A), for the first i where ai and bi differ.

The set up just above is

Consider a finite set A, often called alphabet, which is totally ordered.

which is not the setting we have for lexicographical comparisons in Julia. Tuple{String, Int} are comparable even though String and Intare not, so tuples aren't just sequences from some alphabet that is totally ordered to begin with.

JeffBezanson commented 4 years ago

From triage: change everything to match Tuples. It's better for the error to depend on the types rather than the specific case you happen to ask about.

goretkin commented 4 years ago

Was there a discussion around how breaking this change is?

StefanKarpinski commented 4 years ago

It doesn't break any code that wasn't already broken. You might just not have discovered that your code was broken yet. What I mean by that is that it may happen to work for some input data, but for other equally valid and possible input data, the same program will error. Still, only a change we would make in a minor release.

goretkin commented 4 years ago

Honestly, I'm not invested in maintaining semver compatibility. But I think this is more breaking than what you're stating.

For example, code could be sorting a v::Vector{NTuple{2, Union{Nothing, Int}}}. If that code isn't already broken, then it's just waiting to be broken according to you. But v could conceivably be of type Union{Vector{Tuple{Int, Nothing}}, Vector{Tuple{Nothing, Int}}} and that code would never have been broken under the current behavior, and will break under the improved behavior.

I personally agree it should go in a minor release, but I guess my justification must contain some component of wanting consistency more than semver compatibility.