== for immutables should recursively call == on its fields

StefanKarpinski commented 11 years ago

This doesn't make much sense:

julia> immutable Foo{T}
         bar::T
       end

julia> Foo("baz") == Foo("baz")
false

julia> Foo("baz").bar == Foo("baz").bar
true

julia> Foo(1) == Foo(1)
true

If the fields are == to each other then the objects should be == to each other.

StefanKarpinski commented 11 years ago

I would also argue that for mutable types maybe we should throw an error in the absence of specialized ==.

StefanKarpinski commented 11 years ago

Tuples already do the right thing:

julia> ("baz",) == ("baz",)
true

ivarne commented 11 years ago

Arrays also behave this way

julia> {"abc",5} == {"abc",5}
true

julia> {"abc",5} == {"abc",4}
false

I am not sure if I think it is a good thing to introduce more differences between mutable and immutable types, but a default implementation of error for == on user defined types is something I like. How would you define == for different user defined types?

JeffBezanson commented 11 years ago

I can agree that the current default == is not clearly better than the one proposed here. Tuples are an easy case since we know they just "pass through" to their values. Throwing an error is certainly safe; in my view == depends on the abstract meaning of its arguments, so without a definition we really don't know what it means.

ivarne commented 11 years ago

If we throw an error if the comparison function is undefined, the code for comparing arbitrary objects will be longer.

try
    return a == b
catch
    return false
end

or

applicable(==, a, b) && a == b

quinnj commented 10 years ago

@StefanKarpinski, did this get implemented as part of the hashing changes? I seem to recall not needing == definitions for some immutable types after the changes.

JeffBezanson commented 10 years ago

Immutables already call === recursively on fields, so that might have been it. This hasn't been implemented.

quinnj commented 10 years ago

But doesn't this definition mean it is?

==(x,y) = x === y

JeffBezanson commented 10 years ago

That calls ===, which then calls === recursively. The issue asks for == to call == recursively.

quinnj commented 10 years ago

Ah, got it!

StefanKarpinski commented 7 years ago

https://stackoverflow.com/questions/44324800/how-to-use-haskey-and-in-functions-with-complex-dictionnary-keys-in-julia/44333559#44333559

I still think we should do this. All three equality predicates should probably call themselves recursively.

tpapp commented 7 years ago

Should == require equality of the type parameters? Eg is Foo(1) == Foo(1.0) in the example above?

cossio commented 7 years ago

Maybe a simple solution to this issue is to make "hello" === "hello" return true. This would be a patch on the function ===, making it treat string specially.

StefanKarpinski commented 7 years ago

Making "hello" === "hello" is certainly a life goal, but it's not so simple, unfortunately (#22193).

JeffBezanson commented 7 years ago

I'm not a big fan of this change since I think we shouldn't try to guess what == means for an arbitrary type. For example it's not clear what to do with type parameters, and it's not clear whether == should also recur into mutable values. (It's also not easy to implement efficiently; currently we'd need a generated function.)

vtjnash commented 7 years ago

This seems to be close to a duplicate of https://github.com/JuliaLang/julia/issues/12198#issuecomment-122426956?

Keno commented 7 years ago

That's already fixed on master, but it's certainly a larger issue than string (e.g. BigInts).

JeffBezanson commented 7 years ago

That's done --- we now have "abc" === "abc". But surely this is a more general issue, not specific to strings.

yuyichao commented 7 years ago

Fixing only the examples in this thread is not at all what this issue is about.

quinnj commented 7 years ago

If we don't call == recursively, I think it should be a method error; then it's at least obvious that you need to define it instead of getting surprising behavior (sometimes much later than needed!) and having to go back and define == because === doesn't give you what you expect.

JeffBezanson commented 7 years ago

Yes I think making it a method error is worth exploring. That was basically the conclusion of #12198 for the hash function.

StefanKarpinski commented 7 years ago

Most promising approach from triage: only provide a default == fallback if the type satisfies the following recursively defined predicate, P:

P is true of strings, symbols, and bits types
P is true of tuples of values such that P is true of all its values
P is true of immutable types such that P is true of all its fields
P is false otherwise

This essentially amounts to only automatically providing == when no one would be surprised by its behavior. Otherwise you'll get a no method error telling you that you need to define it.

JeffBezanson commented 7 years ago

Yes that sounds good to me.

StefanKarpinski commented 7 years ago

Erm, perhaps the above should also have the condition for the bits types and immutables that no custom == definition is provided.

JeffBezanson commented 7 years ago

Well, that sort of breaks the plan. If we're going to do that we might as well just call == recursively.

StefanKarpinski commented 7 years ago

Well that was what I wanted in the first place, so, ok :D

mauro3 commented 6 years ago

Here a prototype on how a trait-based system could look like (works on 0.6 and 0.7): https://gist.github.com/mauro3/6774cc7ced71b854d4ccb5a9eb513eab

The idea is that two types having an equality relation defined would just do as they do now. However, if there is no specific equality method then belonging to the trait EqualWhenEqualFields would compare their fields to test for equality, whereas on belonging to EqualWhenEqual would not (with fall-back to ===). The comparison of the fields would then again be according to the same rules. Immutables could be in EqualWhenEqualFields by default, mutables not. So "somewhat" similar to Stefan's rule suggestion.

Note, in the gist, two unequal types with the same fields can compare equal, which is probably not what we want (but easy to change).

I haven't figured out yet how this meshes with hashing.

Thoughts? I could try to race this to the finish line.

mauro3 commented 6 years ago

I updated the example gist(rev. 2) to include hashing. In a nutshell this would behave like:

struct A
    a
    b
end
# As A is immutable and thus automatically compares by
# comparing == of fields:
A(1,[2])==A(1,[2]) # -> true (currently false)
# because it is by default part of the trait:
EqualityKind(A(1,[2])) == CompareFields() # -> true

# Type A can opt-out of field-wise comparison
EqualityKind(::A) = ManuallyDefinedComparison()
A(1,[2]) == A(1,[2]) # -> false , as now uses the fall-back of == which is ===

mutable struct AM
    a
    b
end
# As AM is mutable and thus falls back to === comparison:
AM(1,[2])==AM(1,[2]) # -> false (currently also false)
# because it is by default part of the trait:
EqualityKind(A(1,[2])) == ManuallyDefinedComparison() # -> true

# AM can opt-in on field-wise comparison
EqualityKind(::AM) = CompareFields()
AM(1,[2])==AM(1,[2]) # -> true

# Hashing follows suit and is defined field-wise if
# type is part of CompareFields:
hash(AM(1,[2])) == hash(AM(1,[2]))  # -> true (currently false)
# If the type is part of ManuallyDefinedComparison then an error is thrown
# (following jb's lead to remove the fall-back hash method)
hash(A(1,[2])) # -> error (currently this give an object_id based hash)

# One can still shoot oneself in the foot like so
struct Bad
    a
end
==(::Bad, ::Bad) = true
# this breaks the hashing as that is still done field-wise as
EqualityKind(Bad(1)) == CompareFields()
# and thus hashing is done field-wise.

This code works with the gist (rev. 2) by replacing == -> eq and hash => ha.

I think this could strike a good middle ground between convenience and safeness, as most == are either working as expected or can be made to do so with a EqualityKind(::AM) = CompareFields() and hashing works then too. The removal of the fall back (#24354) would thus be less of an inconvenience. However, it is still not 100% save with respect to hashing (see above), but I suspect that will be hard to make safe without interfaces.

Alternatives to minimize foot-shooting further at the expense of convenience could be:

to make the field-wise comparison and hashing opt-in for all types and have no fall-back hashing for any type.
de-couple comparison and hashing trait and make the hashing error by error. Although, that has potential to allow different types of foot-shooting.

Edit: this should probably be thought through with respect to missing, x-ref #24653.

tpapp commented 6 years ago

@mauro3, this is really interesting and could be a nice default for comparisons. Could this code live (at least initially) in a package? Maybe with some other operator name, so as not to pirate types.

mauro3 commented 6 years ago

The gist uses other operator names, so building a package Equality on this could be possible. But I'm not sure what the value of that would be: You'd have to tell people using a package which makes use of Equality to not use == to compare instances of types of that package but eq. No, this would need to be in Base and it is breaking, so either 1.0 or 2.0.

mauro3 commented 6 years ago

I spent a few brain-cycles over the last few days pondering this issue and will try to summarize here.

First, the war for equality and hashing is fought on many fronts, this is one of them. Other currently active fronts:

9381 "Should in use isequal" (and PR #24563)
12198 "Easy to break hashing" (and its PR #24354 removing the fall-back hash)

Second, here I don't look into the difference between == and isequal and always write isequal (as that is tied to hashing). But I think defaults should apply equally to == as the two should be as similar as possible.

A few observations about current equality handling in Julia:

There is a fallback to === for Any, thus any types can be compared.
When custom isequal methods are defined, then similar types generally compare their values and return equal even when their types are different. Examples: isequal(SparseVector([1,2]), [1,2]) is true, isequal((1,2), [1,2]) is false as (presumably) Tuples are different from Vectors. A notable exception is isequal(1:2, [1,2]) is false (#13565, as otherwise hashing would need to be inefficient for ranges)
there is a tension with hashing as isequal and hash need to be in-sync.

I can see three "reasonable" defaults for isequal and ==:

no default (i.e. throw an error)
=== (current default)
field-by-field comparison. This can be subdivided into a) only compare "equal" types field by field (what "equal" means needs defining) b) compare all types field by field

The default maybe different for different types, here it is suggested that 3a maybe good for immutables but definitely not mutables. Also, I think the rules suggested in above comment are too complicated.

The default needs to either create a compatible hash method, or no hash method (which is what's proposed PR #24354). Ideally, if there is a default isequal & hash and a user defines a new isequal method, then the default hash method should be trashed. But I don't think that is possible in Julia.

Solutions:

remove default hash do nothing about isequal. However, Stefan said: "Resolution: remove the default hash method and make sure the API for defining a new hash function is nice and simple to use." Which I think entails an equally easy API to create isequal.
Stefan's (old) suggestion of a identity_parts function. But suffers from "oh, if you want to define ==, you should actually define this other non-obvious function instead." ref.
A traits system to opt in-to/out-of one of the three defaults above. This also suffers from the "oh, if you want to define ==, you should actually define this other non-obvious function instead." but would have a cleaner implementation.

Forward:

add equality/hash stuff to the "Interfaces" section in the docs
provide an easy to use field-by-field comparison and hashing function and a suggestion to use it in the doc-string (I'll put a PR together)
maybe use a traits system to opt-in/out of one of the three options above

(EDIT: I updated the example gist to take points 1-3 into account)

StefanKarpinski commented 6 years ago

For what it's worth, I still favor the approach where you define a function that tells you what the "essence" of an object is with reasonable defaults. My reasoning is this: we have these two functions == and hash which both relate to some deeper notion of identity but neither one can be correctly computed from the other. So what do we do in situations like that? We introduce a deeper function from which they can both be computed correctly and tell people to overload that. I don't think that will be terribly confusing to people.

mauro3 commented 6 years ago

Thanks Stefan for your comments! I had a few more thoughts:

equality can be seen as a property/trait of a type alone (this only occurred to me now...): an object needs to specify which are its "properties" which need to be compared (or hashed) and the order in which they need to be compared (or hashed). These "comp.-properties" can be fields, values returned by functions, e.g. getindex, type-parameters (others?).
there are some performance tricks to catch un-equal things early on, e.g. comparing the length of two arrays before comparing all their values. Probably these could just be absorbed into the comp.-properties.

So, a general "essence" function would need to be able to somehow specify this. I put a prototype of this together in eq-v2.jl.

In the prototype, the essence function is a Holy-trait (I think it has to be) which also stores the properties which need to be compared as functions:

struct C
    t1
    t2
end
Essence(::Type{C}) = Essence{C}((x->Int, x->getfield(x,:t2), x->getfield(x,:t1), x->2))

This says that the hash would be hash(Int, hash(x.t2, hash(x.t1, hash(2)))). Similarly a comparison returns true if another object also has four comparison-properties and they also return the values (Int, x.t2, x.t1, 2).

I think this could work with having a default field-by-field comparison for immutables. Did you have something like this in mind?

mauro3 commented 6 years ago

Ok, I don't think this trait-based equality can be done before feature freeze (if it is in 2 days).

However, I'll prepare a PR simply updating the default isequal to use field-by-field comparison for immutables.

tpapp commented 6 years ago

@mauro3, are you still working on this? If not, I would like to make a try, as I really want to see this in v0.7. Wondering if this is the right track:

can I use generated functions? they seem to compile into efficient code
the fallbacks seem to work fine
I am not allowing types with different parameters to be equal, should I? What would be the right way?

import Base: ==, hash
using Test

@generated function struct_recursive_equal(a::T, b::T) where T
    mapreduce(F -> :((a.$F == b.$F)), (A, B) -> :($A && $B), fieldnames(a))
end

@generated function struct_recursive_hash(a, h)
    mapreduce(F -> :(a.$F), (A, B) -> :(hash($B, $A)), :h, fieldnames(a))
end

function ==(a::T, b::T) where T
    if Base.isstruct(T) && Base.isimmutable(T)
        struct_recursive_equal(a, b)
    else
        a === b
    end
end

function hash(a::T, h::Uint) where T
    if Base.isstruct(T) && Base.isimmutable(T)
        struct_recursive_hash(a, hash(string(T)))
    else
        hash_uint(3h - object_id(x))
    end
end

struct Foo{S, T}
    a::S
    b::T
end

@test Foo(1, 2) == Foo(1, 2)
@test Foo(1.0, 2) ≠ Foo(1, 2)   # NOTE different

@test hash(Foo(1, 2)) == hash(Foo(1, 2))
@test hash(Foo(1.0, 2)) ≠ hash(Foo(1, 2)) # NOTE different

JeffBezanson commented 6 years ago

:+1: Cool, that looks correct (assuming we want to do this).

This should use if @generated (there are some examples in Base).

I'm not sure what to do about type parameters. The safe thing is to require types to match. A fancy option is to ignore type parameters that appear in field types (idea being those are covered by comparing field values).

tpapp commented 6 years ago

OK, will do it soon. if @generated is very nice, did not know about it. A question: the hash method should go into hashing.jl, but what about the ==? Is essentials.jl the right place for it?

JeffBezanson commented 6 years ago

This would go with the existing == fallback in operators.jl.

vtjnash commented 6 years ago

I'm not sure what to do about type parameters. The safe thing is to require types to match. A fancy option is to ignore type parameters that appear in field types (idea being those are covered by comparing field values).

That does sound really bad. If you wanted this to be correct for Complex, for example, you would need to know that the parameters should be ignored. But would also need to know that in other cases (e.g. Val{T}), they are significant. And for yet other cases (such as RefArray{T, A<:AbstractArray{T}}, the dependence is through another typevar – although == is perhaps correctly just === for this type?), and finally for some unusually cases, the dependence seems like it may be conditional (either because it's a lower bound like T >: S or the field type is a union like ::Union{T, Nothing}.

JeffBezanson commented 6 years ago

Yep, it's a problem. Certainly types with no fields should just be compared by ===. When there are fields, maybe better to ignore type parameters.

@vtjnash 's example of RefArray is worrying --- that's a case where the type is immutable but we want it compared by === by default. Of course the type could be made mutable, but it shouldn't have to be, and that's kind of a gotcha.

tpapp commented 6 years ago

I think that defining == to be true only when the type parameters match strictly is going to disappoint a lot of expectations.

What's the canonical way to "normalize" a possibly parametrized type to the type name in v0.7? I can only think of T.name.

Anyhow, I am still happy to do a PR, will do the non-strict version (not comparing parameters) unless there is an objection.

JeffBezanson commented 6 years ago

Yes, T.name should work (and can be type inferred). But the more @vtjnash and I discuss this the more we return to thinking it might not be a good idea...

vtjnash commented 6 years ago

Not T.name, but Base.typename can be used. However, it is definitely not advisable / canonical to rip apart a type like that.

tpapp commented 6 years ago

OK, I will wait for the outcome of the discussion then. I think @mauro3's suggestion of enabling == with traits on demand for certain types was a very solid one; hope it is taken up at some point.

JeffBezanson commented 6 years ago

The trait idea is orthogonal; the question would remain as to what the default behavior should be in the absence of trait declarations for a new type.

mauro3 commented 6 years ago

Sorry for not having followed up on my promise, life caught up... I don't think I will have time to put work into this just now. @tpapp, you're welcome to work on the traits-stuff if you fancy.

Anyway, now that I've spent some time thinking about this, my 2-cents:

a rule like proposed above is too complicated. I still don't fully understand it.
equality should not be recursive. Each field-type needs to decide itself how to be compared. Otherwise, it could happen that a == b even though a.f != b.f or vice-versa.
I think it would be confusing to have default-equality behave differently on whether a type is mutable or immutable. I might be wrong, but my impression is that the motivation to have immutables and mutables comparing differently by default, is hash-table use. However, that issue is addressed adequately by removing the fall-back hash method.
I agree with above discussion that type parameters should not participate in equality (by default), as that is how (almost?) all == methods work.

Thus, this leaves the two solutions:

default comparison field-by-field for all types, or
no default method defined for comparison

JeffBezanson commented 6 years ago

equality should not be recursive. Each field-type needs to decide itself how to be compared. Otherwise, it could happen that a == b even though a.f != b.f or vice-versa.

Do you mean equality should be recursive?

my impression is that the motivation to have immutables and mutables comparing differently by default, is hash-table use

That's not the reason. The reason is that immutable values with the same contents cannot be distinguished, while mutable values with the same contents can be distinguished by mutating one and seeing if the "other" one also changes.

mauro3 commented 6 years ago

Do you mean equality should be recursive?

This is probably a miss-understanding. I mean that the following is not desirable:

struct A
  a
  b
end
==(a::A,aa::A) = a.a==a.a # disregard b

struct B
  a::A
end

# with recursion:
B(A(1,2)) != B(A(1,1))
# even though
A(1,2) == A(1,1)

Thus I think "field-by-field comparison" describes a system where B(A(1,2))==B(A(1,1)) better than "recursion".

That's not the reason. The reason is that immutable values with the same contents cannot be distinguished, while mutable values with the same contents can be distinguished by mutating one and seeing if the "other" one also changes.

But if I care about this I should use ===. When == comparing two arrays, I don't expect that they stay == forever after.

JeffBezanson commented 6 years ago

Thus I think "field-by-field comparison" describes a system where B(A(1,2))==B(A(1,1)) better than "recursion".

Ah, I see now. I don't think anybody has proposed this. We have been talking about making == on B recursively call == on B's a field, which would call the == method defined for A. It's recursive because == would internally call == on a smaller object.

tpapp commented 6 years ago

A minimally invasive yet reasonably convenient extension of the status quo could be the following:

3 possible traits for ==: a. requiring strict ===, this is the default b. requiring types to match exactly, all fields compared by == then, according to their own methods c. only Base.typename is required to match, all fields are compared by ==.
hashing is always compatible with the above definition.

This would preserve existing behavior, avoid spurious equality when not intended (eg making a Dict-like type compare every field), yet allow very convenient one-line extensions for the most frequently requested form of equality.

This would also not be a breaking change, and could be done later. Still, if this is acceptable, I would make a try at this, I would love to see this in v1.0. It is asked for very frequently.

JuliaLang / julia

== for immutables should recursively call == on its fields #4648

9381 "Should in use isequal" (and PR #24563)

12198 "Easy to break hashing" (and its PR #24354 removing the fall-back `hash`)

JuliaLang / julia

== for immutables should recursively call == on its fields #4648

9381 "Should in use isequal" (and PR #24563)

12198 "Easy to break hashing" (and its PR #24354 removing the fall-back hash)

12198 "Easy to break hashing" (and its PR #24354 removing the fall-back `hash`)