beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly
MIT License
7 stars 1 forks source link

Buffer hash input #29

Closed haberdashPI closed 12 months ago

haberdashPI commented 1 year ago

NOTE: I'd recommend reviewing this in something with smarter diffs than GitHub's online view (e.g. VSCode) as there are a number of lines with changed indentation and GitHub reads these as completely new lines where better diff views will show that mostly only the indentation has changed.

This refactors how the hash algorithm is used when computing stable_hash so that I can optimize performance more easily. Then it implements two wrappers that allow data to be stored in a buffer before the hash is computed:

MarkerHash changes the hash values returned by stable_hash, so I've implemented a new HashVersion{2}; this new buffering implementation is only used when the root context (which is found by recurisvely calling parent_context) has a hash version of 2. Thus, HashVersion{1} and any contexts that have it as a parent, remain slow.

Note that I plan to make a few more changes that will again change the hash value (#30), so this does not bump the version in Project.toml since there is a planned follow-up PR that will be merged before releasing a new version.

Here are the benchmarks before, and after this PR is applied:

Before

12×5 DataFrame
 Row │ benchmark   hash       base        trait       ratio      
     │ SubStrin…   SubStrin…  String      String      Float64    
─────┼───────────────────────────────────────────────────────────
   1 │ structs     crc        78.667 μs   125.481 ms  1595.09
   2 │ tuples      crc        79.250 μs   31.102 ms    392.453
   3 │ dataframes  crc        79.417 μs   6.382 ms      80.3635
   4 │ numbers     crc        39.875 μs   3.102 ms      77.7842
   5 │ symbols     crc        597.166 μs  21.122 ms     35.3705
   6 │ strings     crc        597.625 μs  13.749 ms     23.0063
   7 │ structs     sha256     545.916 μs  190.883 ms   349.656
   8 │ tuples      sha256     545.917 μs  47.118 ms     86.3101
   9 │ dataframes  sha256     547.500 μs  11.283 ms     20.6081
  10 │ numbers     sha256     271.708 μs  5.433 ms      19.9951
  11 │ symbols     sha256     4.086 ms    32.191 ms      7.87788
  12 │ strings     sha256     4.085 ms    21.856 ms      5.34987

After

12×5 DataFrame
 Row │ benchmark   hash       base        trait       ratio     
     │ SubStrin…   SubStrin…  String      String      Float64   
─────┼──────────────────────────────────────────────────────────
   1 │ structs     crc        70.167 μs   51.761 ms   737.68
   2 │ tuples      crc        71.375 μs   9.623 ms    134.829
   3 │ symbols     crc        530.667 μs  5.145 ms      9.69535
   4 │ strings     crc        527.125 μs  4.413 ms      8.37159
   5 │ dataframes  crc        70.167 μs   385.792 μs    5.4982
   6 │ numbers     crc        35.208 μs   176.875 μs    5.02372
   7 │ structs     sha256     533.041 μs  55.757 ms   104.601
   8 │ tuples      sha256     532.958 μs  10.976 ms    20.5939
   9 │ dataframes  sha256     533.000 μs  993.000 μs    1.86304
  10 │ numbers     sha256     266.125 μs  487.792 μs    1.83294
  11 │ symbols     sha256     4.000 ms    6.611 ms      1.65291
  12 │ strings     sha256     4.000 ms    6.321 ms      1.58011
codecov[bot] commented 1 year ago

Codecov Report

Merging #29 (6b66e60) into main (e2b4f9a) will decrease coverage by 2.03%. The diff coverage is 95.86%.

:exclamation: Current head 6b66e60 differs from pull request most recent head 2c9126e. Consider uploading reports for the commit 2c9126e to get more accurate results

@@            Coverage Diff             @@
##             main      #29      +/-   ##
==========================================
- Coverage   97.26%   95.23%   -2.03%     
==========================================
  Files           1        1              
  Lines         146      210      +64     
==========================================
+ Hits          142      200      +58     
- Misses          4       10       +6     
Files Coverage Δ
src/StableHashTraits.jl 95.23% <95.86%> (-2.03%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more