PainterQubits / Unitful.jl

Physical quantities with arbitrary units
Other
603 stars 111 forks source link

Should hashing of `Quantity{BigFloat}` behave differently to those of `BigFloat`'s? #378

Open cnaak opened 4 years ago

cnaak commented 4 years ago

Hashing has a different behavior for "bare" BigFloat's and unit-ed Quantity{BigFloat}'s. For the former, hash is unaffected by BigFloat's data address in memory, stored in its d field; while for the latter, hash do seem to take the data memory address into account, so that equal numerical values hash the same for "bare" numbers, irrespectively of memory location, and can hash differently for same unit-ed Quantity-ies:

Take a, b, and c which are numerically equal to BigFloat(1//7), but a.d === c.d and a.d !== b.d:

julia> [a.d, b.d, c.d] # a and c are the same, but b resides in a different location
3-element Array{Ptr{UInt64},1}:
 Ptr{UInt64} @0x00007fc23e2a2798
 Ptr{UInt64} @0x00007fc23f92d6d8
 Ptr{UInt64} @0x00007fc23e2a2798

julia> a == c == b # They have the same values...
true

julia> a === c !== b # ...but are NOT identical (as === performs bit-level comparison)
true

julia> map(hash, [a, b, c]) # The bare numbers hash the SAME...
3-element Array{UInt64,1}:
 0x861a441bf6ee33e9
 0x861a441bf6ee33e9
 0x861a441bf6ee33e9

julia> map(x -> hash(x * u"m"), [a, b, c]) # ...however, their Quantity-ies do not!
3-element Array{UInt64,1}:
 0x6bc1521c2c63173d
 0xe8f1b97e24b56c91
 0x6bc1521c2c63173d

So, should hashing of Quantity{BigFloat} behave differently to those of BigFloat's?

I see it as inconsistent behavior, that may lead to false negative tests that work just fine for Quantity{Float64}, Quantity{Float32}, etc.

giordano commented 4 years ago

Note that there is no special code for hashing in this package, as far as I can see, but hash(::BigFloat) follows a different code path

sostock commented 4 years ago

I think, the behavior for Quantity{BigFloat} should be fixed, since isequal(x,y) should always imply hash(x) == hash(y).

However, that means hash(1u"m") == hash(100u"cm") should also hold. The Dates stdlib already does this: hash(Second(60)) == hash(Minute(1)). I would be in favor of implementing hash like that (for Unitful 2.0, since it is potentially very breaking), but it seems rather difficult, since every value would have to be converted to a “standard unit” for hashing. Floating-point arithmetic might then lead to examples where hash(x) != hash(y) even though isequal(x,y).

cnaak commented 4 years ago

In my humble opinion, both hash and === operate at bit level, and should be able to detect representation differences.

The memory address in which a datum is stored is immaterial to the datum itself — this is what introduces the difference in behavior in BigFloats from the other AbstractFloat concrete subtypes in the first place, indicating that hash(BigFloat) method ignores BigFloat's .d field, which is the case, as hash(::BigFloat) calls hash(::Any), which calls hash(::Real, ::UInt), which decomposes the number into num, pow, den., and finally decompose has specialized methods for each underlying AbstractFloat concrete subtype (in "... julia/base/hashing2.jl").

So whether one 0.1m and other identical 0.1m values are stored in different places in memory (and the underlying DataType happens to store this info in its structure) should be irrelevant for ==, ===, and hash comparisons (recalling that this also follows what is already established for such "bare" numbers).

Now, the case of comparing 0.1m and 10.0cm, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an == equality test, but fail both === and hash equality tests. Otherwise, one can have instances in which unit(a) == unit(b) returns false while a === b returns true! — which is way less puzzling/surprising than two bare BigFloat's failing a.d == b.d while passing a === b.

Evidently the different unit case exceeds the scope of the original question (in which units and representations are kept the same), and seems to be a topic for further debate.

cnaak commented 4 years ago

Note that there is no special code for hashing in this package, as far as I can see, but hash(::BigFloat) follows a different code path

Based on my previous comment and analysis of Julia's "base/hashing2.jl" source, the Unitful.jl package could specialize hash methods by (i) integer hashing either (debate topic) the unit or the dimension, and then (ii) hashing the numerical value with the previous hash as second argument, which falls back to julia's hashing functions:

julia> a = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m

julia> b = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m

julia> a.val.d == b.val.d
false

julia> function myHash(x)
       h = UInt('U') # UInt magick value / could be something else
       h = hash(unit(x), h)
       return hash(x.val, h)
       end
myHash (generic function with 1 method)

julia> myHash(a)
0x819956baf0640cc6

julia> myHash(b)
0x819956baf0640cc6

julia> [hash(a), hash(b)] # Current fallback
2-element Array{UInt64,1}:
 0x26aa3c69c475a124
 0x08b21fe7b9bd6711

julia> [hash(a.val), hash(b.val)] # Bare values's num, pow, den-based hash
2-element Array{UInt64,1}:
 0x049e46df46389711
 0x049e46df46389711
sostock commented 4 years ago

Now, the case of comparing 0.1m and 10.0cm, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an == equality test, but fail both === and hash equality tests.

They will obviously fail ===, but they should not fail hash, since the behavior of hash should be consistent with isequal. The behavior of === is also completely irrelevant to this discussion (and you cannot customize it anyway).

Otherwise, one can have instances in which unit(a) == unit(b) returns false while a === b returns true!

No, this is literally impossible (unless one changes unit so that its return value depends on some global state and not just the passed argument).

Sadly, a hash method that is consistent with isequal seems difficult to implement in this case. In my opinion, your proposed hash implementation (or something very similar) should be added for Unitful 1.x.y, and an implementation that conforms to the documented hash behavior is something to consider for Unitful 2.0.

sostock commented 4 years ago

I started a discussion about hashing quantities with different units here: #379.