beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly
MIT License
7 stars 1 forks source link

Make hash more robust to future changes in julia type-printing. #46

Closed haberdashPI closed 6 months ago

haberdashPI commented 8 months ago

Description

StableHashTraits relies on the string representation of types which has proven to be a fragile assumption (c.f. #43).

This PR provides a solution more robust to julia internals, relying mostly on the assumption that type parameters can be acessed programmatically. This seems like something much less likely to go away (though how they're accessed might change, that should be easy to make a bugfix for, if/when that happens).

It does however require creating a new stable_hash version and implementing it there. Eventually, (in a separate PR) StableHashTraits should release a breaking version that completely removes version=1 and version=2, since these cannot be trusted to stay stable for Julia 1.11 and later.

In the process of testing out the edge cases for this I found some hash collisions present in version=2 (user defined structs that inherit from Base.Function did not hash the fields of said struct, just the type), so I've added a test for this, and removed these collisions for version>2.

Benchmarks

Author, please:

Run the benchmarks under test/benchmarks.jl locally, and update test/benchmark_records.md by inserting a new table at the end of the document with the updated numbers printed at the end of the benchmark script. Place the table as it occurs before and after the application of this PR.

Before

After

codecov[bot] commented 8 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (6c7ce8f) 95.38% compared to head (668c333) 95.31%. Report is 1 commits behind head on main.

:exclamation: Current head 668c333 differs from pull request most recent head 17fb100. Consider uploading reports for the commit 17fb100 to get more accurate results

Files Patch % Lines
src/StableHashTraits.jl 96.29% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #46 +/- ## ========================================== - Coverage 95.38% 95.31% -0.08% ========================================== Files 2 2 Lines 347 384 +37 ========================================== + Hits 331 366 +35 - Misses 16 18 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

haberdashPI commented 8 months ago

there are a lot of generated functions and Base internals and various hacks... I wonder if the scope of this package is too broad, and hashing everything is too hard to keep working and stable.

I do wonder if the ambition of the v1.0 release (which really expanded how many things were covered by default) has gotten away from me!

This PR aims to reduce, rather than increases, the hacks required though.

I think it would be hard to remove the more broadly scoped hash at this point but maybe it would be good to add something like SafeHashVersion{1} that does not dispatch on Any (or T where {T}) and requires the user to explicitly define hash_method for new types.

I'll add the tests.

ericphanson commented 8 months ago

I do wonder if the ambition of the v1.0 release (which really expanded how many things were covered by default) has gotten away from me!

I filed #50 for more thoughts on this; I think the package could be useful while being significantly more scoped down!

haberdashPI commented 6 months ago

Superseded by #55