beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly
MIT License
7 stars 1 forks source link

stable_hash(::Regex) #66

Closed rasmushenningsson closed 1 day ago

rasmushenningsson commented 3 months ago

Regex is a mutable struct which causes it to stable_hash differently each time.

julia> bytes2hex(stable_hash(r"hello"; version=3))
"d9bdaff2d67b5f0d6890fafddde31b70678837711d5352fbc848a436b47d8e22"

julia> bytes2hex(stable_hash(r"hello"; version=3))
"665ac86a45a852990551faa2462d395029a029336935e12be6bfaf1ecabcc76d"

I think it would make sense to make it hash in a stable manner by default. Do you agree? (The reason why it is a mutable struct is so it can interact with the GC to free the pointer to the compiled regex.)

Here's the corresponding hash function in Base, indicating that we should take pattern, compile_options and match_options into account.

function hash(r::Regex, h::UInt)
    h += hashre_seed
    h = hash(r.pattern, h)
    h = hash(r.compile_options, h)
    h = hash(r.match_options, h)
end
haberdashPI commented 3 months ago

I think it would make sense to make it hash in a stable manner by default. Do you agree?

Yes, that sounds useful to me. I am wrapping up some work in #55 to improve the stability of hashes with a new API design for customizing how types get hashed. I think this improvement makes sense to make after that merges.

haberdashPI commented 3 months ago

In the meantime, I believe the following should be an effective workaround. (Though I haven't tested it).

StableHashTraits.hash_method(::Regex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options))

Or, for a specific context, you could do

struct HashRegex{T}
   parent::T
end
StableHashTraits.parent_context(x::HashRegex) = x.parent
StableHashTraits.hash_method(::Regex, ::HashRegex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options))
stable_hash(r"a*b*", HashRegex(HashVersion{3}()))
haberdashPI commented 1 day ago

Resolved by #58