emilypi / Base16

BSD 3-Clause "New" or "Revised" License
4 stars 8 forks source link

Small improvement to `isValidBase16`? #24

Closed Vlix closed 8 months ago

Vlix commented 1 year ago

Using elem might be more equality checks than necessary, maybe? (not sure how optimized the elem function for ByteString is)

What about something like:

isValidBase16 :: ByteString -> Bool
isValidBase16 = all (\w8 ->
    w8 >= 0x30 && w8 <= 0x66 -- quickly check min and max bounds for valid base16
        && (
            w8 <= 0x39 -- check if it's a number (which are most likely)
                || w8 >= 0x61 -- check if it's [a-f]
                || (w8 >= 0x41 && w8 <= 0x46) -- check if it's [A-F]
            )
        )
Vlix commented 1 year ago

Oh, and while I was looking around that part of the code, I noticed isBase16 actually decodes the entire bytestring just for an isRight. I think that can be replaced by a length bs `rem` 2 == 0? i.e.:

isBase16 :: ByteString -> Bool
isBase16 bs@(BS _ len) =
    -- check length first because it's cheapest
    len `rem` 2 == 0 && isValidBase16 bs
Vlix commented 1 year ago

I also noticed the PS pattern is used a bunch, even though the package depends on bytestring >= 0.11, which means BS is the actual data constructor and which means you don't have to add plusPtrs that use the offset, since it'll always be 0 (if bytestring >= 0.11)

Not precisely on-topic for this issue, but wondered if it's a possible performance tweak?

Vlix commented 1 year ago

Hmmm, doing some benchmarks on my own laptop (not a great one) came back with this:

Benchmark results of isValidBase16 ``` benchmarking isValidBase16/25/base16 time 302.4 ns (292.1 ns .. 316.7 ns) 0.982 R² (0.964 R² .. 0.996 R²) mean 306.3 ns (295.4 ns .. 329.4 ns) std dev 48.79 ns (27.92 ns .. 78.94 ns) variance introduced by outliers: 96% (severely inflated) benchmarking isValidBase16/25/base16 new time 133.1 ns (131.2 ns .. 136.0 ns) 0.994 R² (0.983 R² .. 1.000 R²) mean 134.7 ns (132.2 ns .. 143.1 ns) std dev 12.50 ns (5.513 ns .. 25.30 ns) variance introduced by outliers: 89% (severely inflated) benchmarking isValidBase16/100/base16 time 1.085 μs (1.078 μs .. 1.095 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 1.081 μs (1.079 μs .. 1.087 μs) std dev 12.28 ns (1.800 ns .. 21.64 ns) benchmarking isValidBase16/100/base16 new time 424.7 ns (422.9 ns .. 426.4 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 419.4 ns (417.3 ns .. 421.7 ns) std dev 7.797 ns (6.713 ns .. 9.536 ns) variance introduced by outliers: 22% (moderately inflated) benchmarking isValidBase16/1000/base16 time 10.63 μs (10.60 μs .. 10.67 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 10.63 μs (10.62 μs .. 10.66 μs) std dev 52.04 ns (28.20 ns .. 88.94 ns) benchmarking isValidBase16/1000/base16 new time 4.791 μs (4.766 μs .. 4.830 μs) 0.999 R² (0.997 R² .. 1.000 R²) mean 4.836 μs (4.796 μs .. 5.005 μs) std dev 209.8 ns (101.0 ns .. 443.6 ns) variance introduced by outliers: 55% (severely inflated) benchmarking isValidBase16/10000/base16 time 106.3 μs (105.9 μs .. 106.9 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 106.3 μs (106.1 μs .. 107.0 μs) std dev 1.275 μs (264.7 ns .. 2.664 μs) benchmarking isValidBase16/10000/base16 new time 140.8 μs (140.5 μs .. 141.1 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 140.7 μs (140.5 μs .. 141.0 μs) std dev 690.8 ns (398.2 ns .. 1.231 μs) benchmarking isValidBase16/100000/base16 time 1.060 ms (1.057 ms .. 1.063 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.062 ms (1.061 ms .. 1.065 ms) std dev 6.777 μs (3.933 μs .. 10.91 μs) benchmarking isValidBase16/100000/base16 new time 1.591 ms (1.587 ms .. 1.594 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.592 ms (1.590 ms .. 1.594 ms) std dev 8.239 μs (7.215 μs .. 9.814 μs) benchmarking isValidBase16/1000000/base16 time 11.08 ms (10.71 ms .. 11.42 ms) 0.992 R² (0.985 R² .. 0.997 R²) mean 11.38 ms (11.16 ms .. 11.61 ms) std dev 576.2 μs (466.0 μs .. 709.8 μs) variance introduced by outliers: 24% (moderately inflated) benchmarking isValidBase16/1000000/base16 new time 14.68 ms (14.61 ms .. 14.73 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 15.05 ms (14.95 ms .. 15.20 ms) std dev 305.6 μs (228.4 μs .. 372.2 μs) ```

I don't get why at around 10k it goes to 40-60% slower, instead of the 50% faster from 1k and less :thinking:

Vlix commented 1 year ago

The isBase16 tweak is a lot quicker though, and it still passes all the tests (even though I haven't set up a property test making sure it has parity with the current and obviously correct implementation):

Benchmark results of isBase16 ``` benchmarking isBase16/25/base16 time 434.0 ns (428.2 ns .. 442.6 ns) 0.995 R² (0.991 R² .. 0.998 R²) mean 452.5 ns (441.0 ns .. 473.5 ns) std dev 51.33 ns (30.81 ns .. 87.16 ns) variance introduced by outliers: 92% (severely inflated) benchmarking isBase16/25/base16 new time 312.8 ns (310.1 ns .. 317.6 ns) 0.994 R² (0.983 R² .. 1.000 R²) mean 321.9 ns (314.1 ns .. 343.1 ns) std dev 42.28 ns (22.99 ns .. 72.12 ns) variance introduced by outliers: 94% (severely inflated) benchmarking isBase16/25m/base16 time 334.9 ns (334.4 ns .. 335.6 ns) 1.000 R² (0.999 R² .. 1.000 R²) mean 335.8 ns (335.4 ns .. 336.5 ns) std dev 1.745 ns (1.281 ns .. 2.506 ns) benchmarking isBase16/25m/base16 new time 5.304 ns (5.299 ns .. 5.313 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 5.316 ns (5.307 ns .. 5.331 ns) std dev 39.33 ps (24.65 ps .. 59.81 ps) benchmarking isBase16/100/base16 time 1.516 μs (1.510 μs .. 1.525 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.517 μs (1.514 μs .. 1.523 μs) std dev 12.55 ns (6.984 ns .. 20.66 ns) benchmarking isBase16/100/base16 new time 1.192 μs (1.182 μs .. 1.209 μs) 0.999 R² (0.999 R² .. 1.000 R²) mean 1.190 μs (1.187 μs .. 1.200 μs) std dev 19.11 ns (6.345 ns .. 32.71 ns) variance introduced by outliers: 16% (moderately inflated) benchmarking isBase16/100m/base16 time 1.296 μs (1.290 μs .. 1.307 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.296 μs (1.293 μs .. 1.302 μs) std dev 15.27 ns (8.749 ns .. 22.62 ns) benchmarking isBase16/100m/base16 new time 5.415 ns (5.299 ns .. 5.549 ns) 0.998 R² (0.996 R² .. 1.000 R²) mean 5.336 ns (5.302 ns .. 5.418 ns) std dev 154.7 ps (20.03 ps .. 267.4 ps) variance introduced by outliers: 49% (moderately inflated) benchmarking isBase16/1000/base16 time 14.60 μs (14.50 μs .. 14.79 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 14.56 μs (14.53 μs .. 14.72 μs) std dev 179.4 ns (60.74 ns .. 411.3 ns) benchmarking isBase16/1000/base16 new time 11.74 μs (11.69 μs .. 11.83 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 11.71 μs (11.70 μs .. 11.78 μs) std dev 111.0 ns (37.37 ns .. 224.3 ns) benchmarking isBase16/1000m/base16 time 12.80 μs (12.75 μs .. 12.86 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 12.79 μs (12.77 μs .. 12.85 μs) std dev 88.33 ns (39.34 ns .. 180.2 ns) benchmarking isBase16/1000m/base16 new time 5.417 ns (5.331 ns .. 5.534 ns) 0.998 R² (0.997 R² .. 1.000 R²) mean 5.352 ns (5.318 ns .. 5.423 ns) std dev 155.6 ps (77.90 ps .. 240.3 ps) variance introduced by outliers: 50% (moderately inflated) benchmarking isBase16/10000/base16 time 149.6 μs (143.7 μs .. 160.5 μs) 0.984 R² (0.966 R² .. 1.000 R²) mean 145.7 μs (144.0 μs .. 153.2 μs) std dev 9.362 μs (295.8 ns .. 20.71 μs) variance introduced by outliers: 63% (severely inflated) benchmarking isBase16/10000/base16 new time 117.3 μs (116.6 μs .. 118.7 μs) 0.999 R² (0.997 R² .. 1.000 R²) mean 117.1 μs (116.7 μs .. 118.6 μs) std dev 2.335 μs (412.9 ns .. 4.886 μs) variance introduced by outliers: 14% (moderately inflated) benchmarking isBase16/10000m/base16 time 127.7 μs (127.4 μs .. 128.0 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 127.7 μs (127.5 μs .. 128.1 μs) std dev 838.0 ns (411.0 ns .. 1.520 μs) benchmarking isBase16/10000m/base16 new time 5.368 ns (5.318 ns .. 5.427 ns) 0.999 R² (0.998 R² .. 1.000 R²) mean 5.353 ns (5.324 ns .. 5.443 ns) std dev 149.6 ps (65.40 ps .. 282.0 ps) variance introduced by outliers: 48% (moderately inflated) benchmarking isBase16/100000/base16 time 1.480 ms (1.467 ms .. 1.494 ms) 0.998 R² (0.997 R² .. 0.999 R²) mean 1.531 ms (1.512 ms .. 1.556 ms) std dev 75.42 μs (62.80 μs .. 102.2 μs) variance introduced by outliers: 36% (moderately inflated) benchmarking isBase16/100000/base16 new time 1.207 ms (1.139 ms .. 1.338 ms) 0.941 R² (0.873 R² .. 0.999 R²) mean 1.196 ms (1.172 ms .. 1.280 ms) std dev 144.0 μs (14.91 μs .. 305.1 μs) variance introduced by outliers: 80% (severely inflated) benchmarking isBase16/100000m/base16 time 1.277 ms (1.273 ms .. 1.283 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 1.282 ms (1.280 ms .. 1.288 ms) std dev 12.40 μs (7.194 μs .. 19.14 μs) benchmarking isBase16/100000m/base16 new time 5.317 ns (5.298 ns .. 5.357 ns) 0.999 R² (0.996 R² .. 1.000 R²) mean 5.346 ns (5.304 ns .. 5.506 ns) std dev 258.5 ps (22.21 ps .. 546.7 ps) variance introduced by outliers: 74% (severely inflated) benchmarking isBase16/1000000/base16 time 14.31 ms (14.17 ms .. 14.43 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 14.51 ms (14.46 ms .. 14.63 ms) std dev 187.8 μs (99.75 μs .. 283.1 μs) benchmarking isBase16/1000000/base16 new time 11.67 ms (11.64 ms .. 11.71 ms) 1.000 R² (0.999 R² .. 1.000 R²) mean 11.73 ms (11.70 ms .. 11.79 ms) std dev 99.86 μs (51.28 μs .. 167.6 μs) benchmarking isBase16/1000000m/base16 time 12.66 ms (12.53 ms .. 12.79 ms) 0.999 R² (0.996 R² .. 1.000 R²) mean 12.96 ms (12.81 ms .. 13.28 ms) std dev 535.3 μs (51.92 μs .. 944.6 μs) variance introduced by outliers: 17% (moderately inflated) benchmarking isBase16/1000000m/base16 new time 5.314 ns (5.306 ns .. 5.326 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 5.364 ns (5.332 ns .. 5.450 ns) std dev 163.1 ps (89.22 ps .. 270.5 ps) variance introduced by outliers: 52% (severely inflated) ```

N.B. the benchmarks with an -m suffix are bad strings where I drop 1-ed from the encoded bytestring used in the equivalent benchmark without the -m suffix. This way it also tests if the format is invalid, which turns the new implementation into an O(1) check in those situations.

emilypi commented 1 year ago

Sure, whatever this is fine. I've not spent any time optimizing the convenience functions because it's diminishing returns, but if you have something that makes it significantly speedy, a PR is welcome. Just stop spamming the issue 😭

Vlix commented 8 months ago

Done with #27

Anyone checking MB's or GB's of base16 encoded data will now take much less time :)