beacon-biosignals / StableHashTraits.jl

Compute hashes over any Julia object simply and reproducibly
MIT License
9 stars 3 forks source link

Hash more types, elide some types. #31

Closed haberdashPI closed 9 months ago

haberdashPI commented 1 year ago

Description

This creates a hash context (HashVersion{3}()) which hashes more type information: it includes a type identifier for all primitive types.

It also introduces an optimization to prevent this improvement from slowing down the benchmarks. This optimization elides the type identifier of primitive types in some of the cases where it is redundant. When the eltype or the struct type is included in the hash and is concrete, then the individual collection or struct members do not include their types in the hash, since it is redundant with their container's type.

Before

12×5 DataFrame
 Row │ benchmark   hash       base        trait       ratio     
     │ SubStrin…   SubStrin…  String      String      Float64   
─────┼──────────────────────────────────────────────────────────
   1 │ structs     crc        71.542 μs   1.116 ms    15.6027
   2 │ tuples      crc        71.459 μs   918.917 μs  12.8594
   3 │ dataframes  crc        71.458 μs   257.166 μs   3.59884
   4 │ numbers     crc        35.916 μs   126.666 μs   3.52673
   5 │ symbols     crc        635.875 μs  629.000 μs   0.989188
   6 │ strings     crc        655.500 μs  561.292 μs   0.856281
   7 │ structs     sha256     543.166 μs  3.045 ms     5.60525
   8 │ tuples      sha256     543.125 μs  2.594 ms     4.77668
   9 │ symbols     sha256     1.494 ms    2.264 ms     1.51544
  10 │ strings     sha256     1.484 ms    2.196 ms     1.47992
  11 │ dataframes  sha256     543.125 μs  749.125 μs   1.37929
  12 │ numbers     sha256     270.958 μs  371.833 μs   1.37229

After

28×6 DataFrame
 Row │ version    benchmark   hash       base        trait       ratio     
     │ SubStrin…  SubStrin…   SubStrin…  String      String      Float64   
─────┼─────────────────────────────────────────────────────────────────────
   1 │ 2          structs     crc        70.166 μs   1.190 ms    16.9633
   2 │ 2          tuples      crc        70.250 μs   932.459 μs  13.2734
   3 │ 2          dataframes  crc        71.458 μs   262.708 μs   3.6764
   4 │ 2          vnumbers    crc        35.167 μs   128.750 μs   3.6611
   5 │ 2          numbers     crc        35.917 μs   128.417 μs   3.57538
   6 │ 2          symbols     crc        616.459 μs  666.208 μs   1.0807
   7 │ 2          strings     crc        647.208 μs  603.042 μs   0.931759
   8 │ 2          structs     sha256     543.084 μs  3.040 ms     5.59743
   9 │ 2          tuples      sha256     543.125 μs  2.583 ms     4.75589
  10 │ 2          symbols     sha256     1.484 ms    2.238 ms     1.50779
  11 │ 2          strings     sha256     1.464 ms    2.176 ms     1.48634
  12 │ 2          numbers     sha256     266.083 μs  375.542 μs   1.41137
  13 │ 2          vnumbers    sha256     265.875 μs  370.625 μs   1.39398
  14 │ 2          dataframes  sha256     532.958 μs  740.708 μs   1.38981
  15 │ 3          structs     crc        70.917 μs   1.068 ms    15.0563
  16 │ 3          tuples      crc        70.292 μs   644.917 μs   9.17483
  17 │ 3          vnumbers    crc        35.167 μs   130.167 μs   3.7014
  18 │ 3          dataframes  crc        70.083 μs   258.750 μs   3.69205
  19 │ 3          numbers     crc        35.208 μs   127.792 μs   3.62963
  20 │ 3          symbols     crc        648.000 μs  666.208 μs   1.0281
  21 │ 3          strings     crc        619.167 μs  156.333 μs   0.252489
  22 │ 3          structs     sha256     543.125 μs  2.232 ms     4.11024
  23 │ 3          tuples      sha256     533.041 μs  1.711 ms     3.21027
  24 │ 3          symbols     sha256     1.491 ms    2.255 ms     1.51312
  25 │ 3          numbers     sha256     266.041 μs  373.084 μs   1.40236
  26 │ 3          vnumbers    sha256     265.917 μs  370.792 μs   1.39439
  27 │ 3          dataframes  sha256     533.000 μs  741.958 μs   1.39204
  28 │ 3          strings     sha256     1.475 ms    1.059 ms     0.718063
codecov[bot] commented 1 year ago

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (1447160) 95.97% compared to head (6dec628) 93.37%.

:exclamation: Current head 6dec628 differs from pull request most recent head 8b4d33b. Consider uploading reports for the commit 8b4d33b to get more accurate results

Files Patch % Lines
src/StableHashTraits.jl 89.18% 12 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #31 +/- ## ========================================== - Coverage 95.97% 93.37% -2.60% ========================================== Files 1 1 Lines 273 347 +74 ========================================== + Hits 262 324 +62 - Misses 11 23 +12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

haberdashPI commented 9 months ago

This has been made obsolete by #46