JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

manual chapter on equality and hashing #6833

Closed stevengj closed 9 years ago

stevengj commented 10 years ago

It used to be that == called isequal, so that packages wanting to override equality for their own types would override isequal. However, commit 737ad6eba2f94a8542e9013fa84b6f4e3f5d827a silently changed this to make isequal call ==.

This potentially results in subtle breakage (e.g. stevengj/PyPlot.jl#52) of a large number of packages that override isequal:

BioSeq
Blocks
Calendar
DWARF
DataArrays
DataStructures
Datetime
Debug
DualNumbers
Fixtures
FunctionalCollections
HDFS
HyperDualNumbers
JuMP
LibGit2
Morsel
OpenCL
PLX
PatternDispatch
Phylogenetics
PyCall
PyPlot
TimeData
Tk
URIParser

Unfortunately, the fix is not very clean either: in order to remain compatible with 0.2, packages will now have to override both isequal and ==.

I think it's clear that we need to discuss this issue a bit more thoroughly before we release this change in 0.3, hence I'm marking it as a milestone issue.

(Apologies if this was already discussed, but I'm not seeing it anywhere right now. e.g. @JeffBezanson wrote in #90: "I've come to the conclusion that isequal is the more fundamental predicate [...] == then defaults to isequal". What has changed?)

cc: @StefanKarpinski, @jakebolewski

quinnj commented 10 years ago

Note that the Package Status already automatically filed issues with packages affected and some of us have already committed changes (though I haven't officially bumped the version yet thankfully). It'd be great to have a quick decision on this so we can all make the needed changes or revert if needed.

reference: https://github.com/karbarcca/Datetime.jl/issues/47

JeffBezanson commented 10 years ago

I wrote that three years ago. What changed:

Arguably isequal is still more fundamental in that it is consistent with === (=== implies isequal). However, in practice this only affects NaN. It doesn't seem worth bothering everybody with isequal just because of NaN. We are now in a position where in nearly all cases one can just pretend there are two equality functions, == and ===.

stevengj commented 10 years ago

@karbarcca, Package Status only filed issues if this caused a test failure. Overriding isequal instead of == could easily cause more subtle bugs that don't show up in test suites.

JeffBezanson commented 10 years ago

Also I admit there is some confusion due to the apparent parallel between isequal and isless, but these functions have little to do with each other:

  1. All values are subject to equality testing, but most pairs of values are unordered, and indeed order is usually not even defined.
  2. The distinction between total and partial order is much more widespread than the nastiness of NaN.
  3. In the worst cases, any issue with equality can be worked around by defining both == and isequal. You can't work around an incorrect ordering by defining isless, since in those cases it should remain undefined. You could define an isless that throws an error, but this is not as satisfying.
stevengj commented 10 years ago

@JeffBezanson, I suppose I'm fine with the change if this is really what we want. I was just surprised that a breaking change didn't seem to have been discussed anywhere. (The hashing discussions never seemed to have discussed whether == should call isequal or vice versa.)

But for the next year or so (as long as people care about 0.2 compatibility), it seems like dozens of packages will need to override both == and isequal. This should be prominently explained in the NEWS at the very least.

quinnj commented 10 years ago

I think a post in julia-dev would be great to help explain what exactly packages need to implement for 0.2 & 0.3 to work smoothly (plus the isless, isequal distinction), for those of us who obviously didn't understand what was required here....:)

stevengj commented 10 years ago

Note that the manual currently provides no (or misleading) guidance on this issue.

(It still says that isequal(x,y) must imply hash(x) == hash(y), which was a sensible enough rule.)

stevengj commented 10 years ago

What is the downside of continuing with the old behavior (== calls isequal, hence types should override isequal)?

Just that you want to hide isequal?

Isn't it easier to remember that you should provide isequal and isless methods if you want to use your own comparison predicates for collections? (As opposed to the rule: override isless and ==, but also provide isequal if you care about Julia 0.2.)

JeffBezanson commented 10 years ago

The help for == and isequal has been carefully updated and is entirely correct. Hashing still uses isequal, but isequal defaults to ==.

stevengj commented 10 years ago

Lots of other functions use ==. e.g. findnext uses ==. So if you only override isequal (Julia 0.2 behavior) then findnext will break in 0.3. Also != and so on.

The point is that if you have a new type, in the common case you only want to define a single equality predicate and have it work for comparisons everywhere. Why not isequal?

JeffBezanson commented 10 years ago

Also I suspect there is a lot of code that (previously, incorrectly) defines == just because it is so intuitive. Generally using only == will be easier for everybody in the future. Unfortunately package authors who did the right thing have some pain in the short term.

stevengj commented 10 years ago

Here is a list of packages that override == but not isequal:

CRC
Devectorize
FixedPoint
Graphs
Lazy
Polynomial
RandomMatrices
SIUnits
SymPy

Unfortunately, all authors are going to have some pain in the short term if they want to work well with both 0.2 and 0.3.

stevengj commented 10 years ago

Is there some way we can issue a warning in 0.3 if code overrides isequal but not ==?

StefanKarpinski commented 10 years ago

The thing is that it is acceptable to override isequal if you are defining a new floating-point type or a collection type, so it wouldn't be correct. I will, before 0.3 write up a new manual chapter that explains everything about how equality and hashing should work for user-defined types (and built-in ones too).

IainNZ commented 10 years ago

If you go here http://iainnz.github.io/packages.julialang.org/ and click "package statistics for nightly julia" you can see when everything broke. Mostly fixed up now - provided there were tests that caught it.

(@karbarcca I've actually been filing issues on package status change manually, auto-issues still a TODO)

stevengj commented 10 years ago

@IainNZ, it looks like tests caught the problem for less than half of the likely affected packages listed above. What scares me about this change is how silent much of the breakage is likely to be.

I really think there's no alternative to just going through the 2 dozen or so affected packages manually. I'm also worried that people will "fix" the problem by breaking compatibility with 0.2 (often without breaking any tests, it is just that certain operations will be subtly wrong). The bug reports filed against the packages need to emphasize that people should implement both isequal and ==.

StefanKarpinski commented 10 years ago

Sorry about how this ended up getting into master – I wanted to get this merged but I was on vacation, so Jeff went ahead and did it while I was gone. As a result, there hasn't been very good communication about it.

stevengj commented 10 years ago

Okay... feel free to close this when the NEWS and manual are updated.

JeffBezanson commented 10 years ago

Lest people get the wrong idea, I didn't merge this behind your back. Before you left we specifically agreed that I would merge it.

StefanKarpinski commented 10 years ago

Yes, absolutely – I wanted it merged, and I'm glad you did it.

IainNZ commented 10 years ago

FWIW, I think this change is pretty much addressed by packages now, check out the graph on http://iainnz.github.io/packages.julialang.org/ under "Package statistics for nightly Julia..." (all time high of passing packages, I guess filing 100+ issues was worth it :D)

stevengj commented 10 years ago

@IainNZ, I think you're too optimistic about the test coverage. There are a dozen open issues above from packages that I went through manually and seemed to be overloading the wrong thing.

StefanKarpinski commented 10 years ago

Is there anything to be done in Base julia for this, aside from documentation?

stevengj commented 10 years ago

No.

quinnj commented 10 years ago

Assigning @StefanKarpinski here to do some docs updates and clear out this 0.3 issue.

vtjnash commented 10 years ago

@StefanKarpinski bump. this seems really trivial for still being in the 0.3-blocking list

StefanKarpinski commented 10 years ago

This is already superficially documented. What's missing is a chapter on equality and hashing, which is not quite trivial. I think it should just not block 0.3 but should happen soon after.

tkelman commented 10 years ago

How hard is it to change the redirect on docs.julialang.org to always point to latest? Otherwise the stable docs will be up for a while and what most people see by default.

stevengj commented 9 years ago

This has dragged on long enough; I felt like it was sufficient to clarify this relationship in each of the manual sections that discusses isequal and hash.

tkelman commented 9 years ago

backported in 1f68fa810db2f1a4eda17c2ccde8489b1a46863c