Open cnaak opened 4 years ago
Note that there is no special code for hashing in this package, as far as I can see, but hash(::BigFloat)
follows a different code path
I think, the behavior for Quantity{BigFloat}
should be fixed, since isequal(x,y)
should always imply hash(x) == hash(y)
.
However, that means hash(1u"m") == hash(100u"cm")
should also hold. The Dates
stdlib already does this: hash(Second(60)) == hash(Minute(1))
. I would be in favor of implementing hash
like that (for Unitful 2.0, since it is potentially very breaking), but it seems rather difficult, since every value would have to be converted to a “standard unit” for hashing. Floating-point arithmetic might then lead to examples where hash(x) != hash(y)
even though isequal(x,y)
.
In my humble opinion, both hash
and ===
operate at bit level, and should be able to detect representation differences.
The memory address in which a datum is stored is immaterial to the datum itself — this is what introduces the difference in behavior in BigFloat
s from the other AbstractFloat
concrete subtypes in the first place, indicating that hash(BigFloat)
method ignores BigFloat
's .d
field, which is the case, as hash(::BigFloat)
calls hash(::Any)
, which calls hash(::Real, ::UInt)
, which decomposes
the number into num, pow, den
., and finally decompose
has specialized methods for each underlying AbstractFloat
concrete subtype (in "... julia/base/hashing2.jl").
So whether one 0.1m
and other identical 0.1m
values are stored in different places in memory (and the underlying DataType happens to store this info in its structure) should be irrelevant for ==
, ===
, and hash
comparisons (recalling that this also follows what is already established for such "bare" numbers).
Now, the case of comparing 0.1m
and 10.0cm
, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an ==
equality test, but fail both ===
and hash
equality tests. Otherwise, one can have instances in which unit(a) == unit(b)
returns false
while a === b
returns true
! — which is way less puzzling/surprising than two bare BigFloat
's failing a.d == b.d
while passing a === b
.
Evidently the different unit case exceeds the scope of the original question (in which units and representations are kept the same), and seems to be a topic for further debate.
Note that there is no special code for hashing in this package, as far as I can see, but
hash(::BigFloat)
follows a different code path
Based on my previous comment and analysis of Julia's "base/hashing2.jl" source, the Unitful.jl
package could specialize hash
methods by (i) integer hashing either (debate topic) the unit or the dimension, and then (ii) hashing the numerical value with the previous hash as second argument, which falls back to julia's hashing functions:
julia> a = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m
julia> b = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m
julia> a.val.d == b.val.d
false
julia> function myHash(x)
h = UInt('U') # UInt magick value / could be something else
h = hash(unit(x), h)
return hash(x.val, h)
end
myHash (generic function with 1 method)
julia> myHash(a)
0x819956baf0640cc6
julia> myHash(b)
0x819956baf0640cc6
julia> [hash(a), hash(b)] # Current fallback
2-element Array{UInt64,1}:
0x26aa3c69c475a124
0x08b21fe7b9bd6711
julia> [hash(a.val), hash(b.val)] # Bare values's num, pow, den-based hash
2-element Array{UInt64,1}:
0x049e46df46389711
0x049e46df46389711
Now, the case of comparing
0.1m
and10.0cm
, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an==
equality test, but fail both===
andhash
equality tests.
They will obviously fail ===
, but they should not fail hash
, since the behavior of hash
should be consistent with isequal
. The behavior of ===
is also completely irrelevant to this discussion (and you cannot customize it anyway).
Otherwise, one can have instances in which
unit(a) == unit(b)
returnsfalse
whilea === b
returnstrue
!
No, this is literally impossible (unless one changes unit
so that its return value depends on some global state and not just the passed argument).
Sadly, a hash
method that is consistent with isequal
seems difficult to implement in this case. In my opinion, your proposed hash
implementation (or something very similar) should be added for Unitful 1.x.y, and an implementation that conforms to the documented hash
behavior is something to consider for Unitful 2.0.
I started a discussion about hashing quantities with different units here: #379.
Hashing has a different behavior for "bare"
BigFloat
's and unit-edQuantity{BigFloat}
's. For the former,hash
is unaffected byBigFloat
's data address in memory, stored in itsd
field; while for the latter,hash
do seem to take the data memory address into account, so that equal numerical valueshash
the same for "bare" numbers, irrespectively of memory location, and canhash
differently for same unit-edQuantity
-ies:Take
a
,b
, andc
which are numerically equal toBigFloat(1//7)
, buta.d === c.d
anda.d !== b.d
:So, should hashing of
Quantity{BigFloat}
behave differently to those ofBigFloat
's?I see it as inconsistent behavior, that may lead to false negative tests that work just fine for
Quantity{Float64}
,Quantity{Float32}
, etc.