SymbolicML / DynamicQuantities.jl

Efficient and type-stable physical quantities in Julia
https://symbolicml.org/DynamicQuantities.jl/dev/
Apache License 2.0
120 stars 15 forks source link

unexpected behavior with Set? #142

Open maxasauruswall opened 4 weeks ago

maxasauruswall commented 4 weeks ago

Hi All,

Thanks for the great package!

I'm not sure if this behavior is expected, but it surprised me, so I thought I'd ask.

s = Set()
push!(s, 1us"day") # -> Set with 1 element
push!(s, 1us"day") # -> Set with 2 elements

I would have expected the set to only to have 1 element, because:

isequal(1us"day", 1us"day") # -> true

Am I mistaken in my intuition here?

Thanks for your time and work. Cheers, Max

MilesCranmer commented 4 weeks ago

Does Set use ===? Then it would make sense. Otherwise I am also confused.

maxasauruswall commented 4 weeks ago

So, I think it has something to do with how set implements push!, and the way string literals work? (not entirely sure)

Here's the implementation in Julia: push!(s::Set, x) = (s.dict[x] = nothing; s)

And when I try this at the command line:

julia> s = Set(1us"day")

julia> s.dict[1us"day"]
ERROR: KeyError: key 1.0 day not found

julia> [k for k in keys(s.dict)][1] == 1us"day"
true

julia> hash(1us"day")
0xd9a0c8ddc9b8d712

julia> hash(1us"day")
0xfaa6938a339a5027

Since the string literal expands to a new constructor, and the dictionary is keyed by hashes, the Set always adds a new key,val pair, never overwritting the old?

maxasauruswall commented 4 weeks ago

Would implementing a custom hash function for Quantity help? Just reading through the docs: https://docs.julialang.org/en/v1/base/base/#Base.hash

Maybe using: https://github.com/JuliaServices/AutoHashEquals.jl

MilesCranmer commented 4 weeks ago

Yes a custom hash sounds like it would be enough! I’m honesty surprised it doesn’t work automatically based on the fields

maxasauruswall commented 4 weeks ago

from this thread, it sounds like maybe the default is to hash the type too, producing a new value each time, and if we omitted the type from the hash, that might do it.

could be tricky though, since the .dimensions field is itself quite complex.

e.g. the following doesn't work, because (I think), the recursive call to hash(q.dimensions) hashes the dimension type, etc.

function Base.hash(q::Quantity, h::UInt)
    return hash(q.dimensions, hash(q.value, h))
end
maxasauruswall commented 4 weeks ago

Okay, I think I figured it out. Thanks for bearing with me here.

I'm trying to hash a SymbolicDimensions type, and since it's got a mutable vector field, perhaps it's not possible to produce a consistent hash?

I guess I could convert to a Dimensions type first:

hash(1us"day")
0x994971915a7991f5

julia> hash(1us"day")
0xebcabd1b6a6a6910 # <---- different

julia> hash(uexpand(1us"day"))
0x888d385847bc8db6

julia> hash(uexpand(1us"day"))
0x888d385847bc8db6 # <----- same
MilesCranmer commented 4 weeks ago

I think hash can work on Dimensions, but it just needs to have special behavior for SymbolicDimensions that treats the sparsity of those vectors correctly, ignoring any zeros. Then it will work.