Nullus157 / bs58-rs

Another Rust Base58 codec implementation
Apache License 2.0
75 stars 24 forks source link

Don't clone the input when encoding #115

Closed bishopcheckmate closed 8 months ago

bishopcheckmate commented 8 months ago

Hey. This change removes the cloning of input when encoding. Previously the clone was used to re-iterate the input encoding leading zeros. Instead we can do the single pass over the input but split it into two loops, where former will additionally count leading zeros. This seem to not only reduce the memory footprint but avoiding the alloc call increases the performance for short inputs.

I tried doing it with a single loop but that requires additional bool to track whether we've finished counting leading zeros and that additional branching resulted in negative performance impact.

Benchmarks ``` empty/encode_bs58 time: [20.317 ns 20.365 ns 20.401 ns] change: [+3.7331% +3.8908% +4.0624%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 5 (5.00%) low severe 7 (7.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe empty/encode_bs58_noalloc time: [11.123 ns 11.139 ns 11.151 ns] change: [+2.7524% +2.8549% +2.9526%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 4 (4.00%) high mild 5 (5.00%) high severe 1_byte/encode_bs58 time: [36.407 ns 36.435 ns 36.464 ns] change: [-19.111% -18.707% -18.293%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild 1_byte/encode_bs58_noalloc time: [20.551 ns 20.614 ns 20.679 ns] change: [-89.277% -88.380% -87.334%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild 5_bytes/encode_bs58 time: [65.118 ns 65.251 ns 65.384 ns] change: [-16.580% -16.391% -16.194%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild 5_bytes/encode_bs58_noalloc time: [446.45 ns 473.22 ns 495.49 ns] change: [-14.924% -3.1328% +10.916%] (p = 0.63 > 0.05) No change in performance detected. 10_bytes/encode_bs58 time: [128.89 ns 128.97 ns 129.04 ns] change: [-11.933% -11.638% -11.406%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 3 (3.00%) high mild 10_bytes/encode_bs58_noalloc time: [900.44 ns 952.91 ns 996.97 ns] change: [-11.841% -1.3000% +13.407%] (p = 0.84 > 0.05) No change in performance detected. 10_bytes_zero/encode_bs58 time: [55.953 ns 56.055 ns 56.185 ns] change: [-17.366% -17.141% -16.891%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe 10_bytes_zero/encode_bs58_noalloc time: [441.58 ns 468.56 ns 490.98 ns] change: [-11.142% +1.3696% +14.844%] (p = 0.84 > 0.05) No change in performance detected. 10_bytes_max/encode_bs58 time: [136.74 ns 136.88 ns 137.04 ns] change: [-7.7064% -7.1788% -6.6654%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe 10_bytes_max/encode_bs58_noalloc time: [903.39 ns 956.62 ns 1.0004 µs] change: [-13.586% -1.7281% +11.758%] (p = 0.79 > 0.05) No change in performance detected. 32_bytes/encode_bs58 time: [992.92 ns 993.27 ns 993.65 ns] change: [-3.4959% -3.4118% -3.3296%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe 32_bytes/encode_bs58_noalloc time: [1.5096 µs 1.5447 µs 1.5739 µs] change: [-5.5909% -1.5881% +2.7029%] (p = 0.47 > 0.05) No change in performance detected. 256_bytes/encode_bs58 time: [68.399 µs 68.405 µs 68.410 µs] change: [-1.3128% -1.2699% -1.2282%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe 256_bytes/encode_bs58_noalloc time: [70.557 µs 70.713 µs 70.848 µs] change: [-1.1390% -0.7947% -0.4463%] (p = 0.00 < 0.05) Change within noise threshold. ```
bishopcheckmate commented 8 months ago

Actually I had a bug in implementation where I was breaking out of the loop before encoding current value. After the fix some of encode_bs58_noalloc report huge performance regression. I'll look into that further but likely it doesn't work that well. Marked as draft for now

Nemo157 commented 8 months ago

What is actually cloned is either a &[u8] or Chain<Chain<option::Iter<u8>, slice::Iter<u8>>, slice::Iter<u8>> depending on whether a checksum is used. Cloning both of those should be basically free, it's impossible for it to clone the actual input data because there is no I: Clone bound anywhere.

bishopcheckmate commented 8 months ago

true, thanks for clarifying. I somehow overlooked that those Is are not the same