JuliaServices / AutoHashEquals.jl

A Julia macro to add == and hash() to composite types.
Other
57 stars 12 forks source link

Add `typeseed=e` parameter to `@auto_hash_equals`. #44

Closed gafter closed 1 year ago

gafter commented 1 year ago

Specifying the "type seed"

When we compute the hash function, we start with a "seed" particular to the type being hashed. You can select the seed to be used by specifying typeseed=e.

The seed provided (e) is used in one of two ways, depending on the setting for typearg. If typearg=false (the default), then e will be used as the type seed. If typearg=true, then e(t) is used as the type seed, where t is the type of the object being hashed.

This PR also adds a default type seed that is stable from 1.6 through 1.10.

codecov[bot] commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@81ebfb3). Click here to learn what that means. Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #44 +/- ## ========================================= Coverage ? 92.50% ========================================= Files ? 3 Lines ? 280 Branches ? 0 ========================================= Hits ? 259 Misses ? 21 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gafter commented 1 year ago

@mcmcgrath13 I think I have addressed all of your concerns.

ORBAT commented 1 year ago

Why not change the default "type seed" to be something more sensible if hashing types is generally fraught with problems? I'd rather have more correct behavior out of the box than have to specify a "type seed" separately for every type I use this with.

gafter commented 1 year ago

@ORBAT If we change the default then it would be a breaking change (i.e. change the computed hash value) for all clients. The current default seed is the hash of the symbol which is the type's name. Assuming the hash of a Symbol is stable, which it appears to be, that is stable and only collides for types with the same simple name. So I think the default is a reasonable default. Having said that, some clients might want a different default, and that is why this option is being provided.

ORBAT commented 1 year ago

@gafter ah true, the type is only hashed when typearg=true.

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}, to be consistent with how the hash is seeded in other cases? Especially if the assumption is that people rely on the hashes being stable between package versions? At least Base.hash(MyType{Vector{Int}}) will change every time MyType is recompiled (or Vector's hash changes), and the same would apply to any custom hash function that does something like what Base.hash does (or outright uses it). hashfn(:MyType, ...) would be stable as long as hashfn(::Symbol) is stable. Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

gafter commented 1 year ago

@ORBAT That's a great idea. Let me think about whether it should be done in this PR or separately (before registering a new version).

gafter commented 1 year ago

@ORBAT @mcmcgrath13 I've added a new commit to this PR that implements @ORBAT's suggestion. Please have a look!

gafter commented 1 year ago

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}

That is exactly what typearg turns on or off. When it is off, we definitely do not want to include the type arguments in the hash, otherwise objects that test equal would have different hashes, which violates their invariants.

Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

We do change the seed in that case, based on your advice. With this PR, the seed doesn't ever use the specified hash function, but uses the new type_seed function.