borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.93k stars 738 forks source link

interesting hashes / macs / ciphers / checksums #45

Open ThomasWaldmann opened 9 years ago

ThomasWaldmann commented 9 years ago

https://github.com/Cyan4973/xxHash - not a cryptographic hash fn, not for HMAC! So, maybe we could use it as a crc32 replacement (if we keep the crc32(header+all_data) approach). borg uses xxh64 at some places

siphash - cryptographic hash fn (internally used by python >= 3.4), but: only 64bits return value. a 128bit version is "experimental".

libsodium has some hashes / macs also. but not yet widespread on linux dists.

last but not least: sha512-256 is faster on 64bit CPUs than sha256.

namelessjon commented 8 years ago

If/when you do feel you can use libsodium, that also opens up two things which could be useful:

ThomasWaldmann commented 8 years ago

@namelessjon interesting, thanks. about the counter re-use issue: there is also this idea of creating per-backup(-thread) random session keys, start counter from 0, encrypt the keys with the master key and store them with the backup.

namelessjon commented 8 years ago

@ThomasWaldmann That sounds less fragile than the current implementation at least (I think).

The secretbox option from libsodium, despite the larger nonce, adds the same 40 byte overhead to the files, because the Poly-1305 MAC is only 16 bytes. I guess if you're now including a pointer to the encrypted session key with each encrypted blob, secretbox would have less overhead, but that shouldn't be that significant anyway?

ThomasWaldmann commented 8 years ago

I had a look at libsodium yesterday, seems pretty nice and it also is in some stable linux distributions now.

It would be useful to get some comparative performance values: aes256-ctr + hmac-sha256 vs aes256-gcm (hw accel.) vs. xsalsa20 + poly1305 same for sha256 against some faster hash from libsodium.

For interfacing, we have 2 options: either via cython (like we use it for openssl right now) or using some python wrapper for libsodium.

namelessjon commented 8 years ago

https://www.imperialviolet.org/2014/02/27/tlssymmetriccrypto.html < not a perfect comparison (it's ChaCha20, not XSalsa20, but I believe the performance is supposed to be similar), and that's one or more intel processor generations ago, but there's that. However, I think since then libsodium has picked up an assembly version of Salsa20 which should be faster.

ThomasWaldmann commented 8 years ago

pysodium crypto_generichash (256bit) is 2.8 times faster than sha256 from python stdlib. pysodium crypto_generichash (512bit) is 1.8 times faster than sha512 from python stdlib.

note: sha256 eats most of the cpu time for borgbackup currently (when using hw accel. aes and lz4).

But: no AES256-CTR in libsodium yet. https://github.com/jedisct1/libsodium/issues/317

namelessjon commented 8 years ago

Seems unlikely aes-ctr will be added in libsodium from how that thread has evolved. I think I agree with the why, too. The nice thing about libsodium is it inherits "hard to mess up" from nacl.

namelessjon commented 8 years ago

Though it does complicate a migration to different algorithms without adding more dependencies

ThomasWaldmann commented 8 years ago

openssl 1.1.0 is scheduled for april 2016 release. update: borg uses openssl 1.1.x https://www.openssl.org/news/changelog.html#x0

chacha20 / poly1305 ocb mode

infectormp commented 8 years ago

Hash news from Google :

enkore commented 8 years ago

I have a branch where I worked about on the LoggedIO write performance and managed to double it when processing large files (45 MB/s => 90 MB/s, vmlocked input, output onto tmpfs; dd does ~500 MB/s here), mainly by managing syncs in a way to give the kernel a chance to do them when it wants to, without compromising transactionality (and indeed, syncs don't make a significant appearance in the profile anymore)

Adding a none64 encryption using SHA512-256 moved it to ~110 MB/s.

Profiled it there (with Cython profiling enabled):

  1. 40 % Chunker.__next__
  2. 35 % _hashlib.openssl_sha512
  3. 10 % CRC32 (which matches very well with CRC32 stand-alone giving me approx 1 GB/s)
  4. 5 % buffered IO writes
  5. 4 % the bytes-join in Plaintext64Key.encrypt
  6. 3.5 % compression (none was enabled)

So it seems to me that the Chunker is the next big target for optimization. i.e. mainly see what the compiler does there and if there is anything left to optimize.

Btw. using that branch in production currently, nada issues so far. So a PR for that will probably come this weekend.

Extraction is basically 70 % SHA-512, 20 % CRC-32 and 10 % IO+misc (for ~210 MB/s). Normal plaintext w/ SHA-256 is 160 MB/s or so. I'd say extraction speed is acceptable for my CPU (which is old and has 'AMD' lasered onto the lid).

ThomasWaldmann commented 8 years ago

As debian stable and ubuntu lts now has libsodium, I've begun working on a cython-based libsodium binding for borg. gives us chacha20-poly1305 as new aead cipher, blake2b as new hash.

Strange, I am seeing less than expected speedup:

speedup sha256 -> sha512: 1.4616587679833115
speedup sha256 -> blake2b: 1.5823430737200959
speedup sha512 -> blake2b: 1.0825666758755845

I first thought this is maybe caused by a slow blake2b 1.0.8 in ubuntu and I manually installed 1.0.10 (which has "record speed avx2 blake2b") - but it doesn't get faster. https://blake2.net/ says blake2b should be about 3x faster than sha512, so what's going wrong here?

ThomasWaldmann commented 8 years ago

Quote from python docs: "An Adler-32 checksum is almost as reliable as a CRC32 but can be computed much more quickly."

Quote from stackexchange: "Do note that Adler32 is almost useless for short runs of data. Up to about 180 bytes, it produces numerous collisions."

>>> from zlib import *
>>> data=b'x'*1000000000
>>> # dt computes the runtime of given function in seconds
>>> dt(lambda: crc32(data))
1.1496269702911377
>>> dt(lambda: adler32(data))
0.49709367752075195
enkore commented 8 years ago

CRC32 is already around 1 GB/s (even on my older CPUs), and should be [much] faster on CPUs with CLMUL (although I'm not sure whether zlib makes use of that - if it doesn't getting an implementation or nudging Python into using one that does would make sense and comes for free (except the hassle)).

For 2.0 it would make sense to switch to something as fast as CRC32 (blake) but with much higher integrity guarantees. E.g. 128+ bit blake checksums on the Repository layer.

enkore commented 8 years ago

All tests made with openssl speed -evp <algorithm>. (Note: AES-NI always includes CLMUL; I therefore don't mention it separately)

AMD K10 'Thuban', 3.3 GHz, no AES-NI, OpenSSL master (to-be 1.1)

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           32807.44k    80855.72k   153107.35k   191174.64k   203030.09k   202454.73k
sha512           22194.31k    88633.71k   174282.49k   272886.07k   312806.06k   313252.22k
blake2s256       30131.61k   121603.55k   248297.91k   334311.34k   345933.32k   338420.01k
blake2b512       29113.06k   115848.20k   340357.74k   517671.73k   574694.83k   582163.52k

aes-256-cbc      65277.71k    71006.95k    73144.51k   184525.14k   185352.90k   186289.92k
aes-256-gcm      50956.21k    56946.58k    57792.17k    60589.06k    59232.56k    58938.56k
aes-256-ocb      61807.33k    65769.66k    66718.91k    66782.89k    66816.68k    67196.32k
chacha20-poly1305   168435.67k   321127.98k   360382.55k   381305.75k   384996.69k   382865.04k

Intel Xeon E3-1231v3, 3.4 GHz, AES-NI

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           55129.16k   145301.59k   311774.38k   428385.09k   468527.79k   476840.16k
sha512           36564.32k   148308.25k   301326.17k   534995.48k   683207.34k   697860.10k
blake2s256       42168.03k   163007.85k   341910.18k   473521.49k   532892.33k   540114.94k
blake2b512       41371.63k   163340.95k   474446.42k   712455.17k   835794.26k   846603.47k

aes-256-cbc     571487.46k   600316.16k   608165.89k   609246.21k   611030.36k   613452.03k
aes-256-ctr     420332.94k  1373547.81k  2840697.54k  3595480.49k  3863664.23k  3909044.31k
aes-256-gcm     373152.65k  1071874.69k  2080868.78k  2579107.16k  2893392.55k  2940644.01k
aes-256-ocb     345922.01k  1456611.33k  2691367.08k  3528726.53k  3820748.80k  3855936.17k
chacha20-poly1305   282856.40k   509213.58k  1095028.99k  1905031.85k  2016478.61k  2010589.87k

powermac G5, dual core, 2 GHz, OpenSSL master (to-be 1.1), configured for ppc64.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256            8032.07k    24662.84k    57131.78k    86498.65k   102213.69k   102951.59k
sha512            7186.19k    28637.80k    66281.56k   118863.87k   155130.18k   157947.22k
blake2s256        8173.06k    32977.38k    69935.79k    98463.06k   112172.86k   112831.15k
blake2b512        7160.29k    28813.66k    87564.20k   152836.62k   195205.22k   198838.20k

aes-256-cbc      50287.13k    58998.33k    63115.43k    64246.10k    64804.47k    64823.65k
aes-256-gcm      33388.54k    37288.54k    38743.81k    39149.57k    39439.41k    39436.67k
aes-256-ocb      37836.55k    43085.70k    44221.53k    44683.95k    44921.75k    44772.01k
chacha20-poly1305    63010.86k   122622.37k   218386.26k   239075.57k   246295.21k   247820.33k

X200, Intel P8600, 2.4 GHz, no AES-NI, OpenSSL git 38e19eb96

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           37834.40k    82205.76k   142650.64k   172888.69k   188077.40k   190355.78k
sha512           25130.82k   101252.47k   156723.29k   219927.55k   250352.98k   252177.07k
blake2s256       23435.04k    92726.06k   184286.46k   246177.13k   273435.31k   275256.66k
blake2b512       21004.03k    82610.52k   246223.54k   382414.17k   457061.72k   460215.64k

aes-256-cbc     128602.15k   149870.77k   155655.25k   157678.25k   156516.85k   157543.08k
aes-256-gcm      40657.10k    46472.35k   120275.88k   129553.07k   131236.39k   131164.84k
aes-256-ocb     107630.54k   123701.66k   125957.38k   129277.61k   130329.51k   128839.82k
chacha20-poly1305   143071.17k   253906.65k   395730.28k   412351.74k   423469.06k   420422.21k

X201, Intel i5-520M (1st gen), AES-NI, 2.5 GHz, OpenSSL master (to-be 1.1)

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           27347.57k    70701.40k   132937.71k   180204.41k   198897.91k   200478.58k
sha512           18721.99k    74486.12k   138151.43k   203135.40k   254793.19k   259274.05k
blake2s256       29862.39k   120343.92k   213276.18k   268852.92k   276566.88k   283360.73k
blake2b512       22243.32k    92160.97k   260590.30k   400471.62k   474613.69k   479858.77k

aes-256-cbc     411130.34k   451687.22k   469638.53k   471725.87k   472540.95k   471177.45k
aes-256-gcm     198351.23k   435388.37k   570959.46k   620392.81k   632221.72k   630816.99k
aes-256-ocb     191981.68k   603404.07k  1036441.86k  1254013.08k  1342182.23k  1355308.67k
chacha20-poly1305   151109.65k   275360.32k   468868.72k   495142.83k   503227.96k   504269.83k

Odroid-C2, ARM Cortex-A53 (NEON acceleration), 2 GHz, AArch64 mode, 2G RAM, OpenSSL master (to-be-1.1)

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           10262.06k    29336.60k    62571.67k    87548.95k    99074.37k   100018.35k
sha512            7540.40k    29637.11k    68571.21k   119539.12k   152745.54k   155860.28k
blake2s256        6914.23k    27511.27k    45578.94k    54736.87k    58282.55k    58521.26k
blake2b512        5008.73k    20017.75k    51662.25k    67957.66k    74815.75k    75377.29k

aes-256-cbc      40206.14k    47970.14k    50627.05k    51336.65k    51544.50k    51555.35k
aes-256-gcm      23389.15k    26157.46k    27123.45k    27378.10k    27464.90k    27478.46k
aes-256-ocb      35397.02k    40917.59k    42634.34k    43232.19k    43095.87k    43238.57k
chacha20-poly1305    51664.38k   106305.89k   208580.15k   235916.72k   247165.12k   247984.32k

A modern ARM core with NEON, performs quite well for AES, and extremely well for ChaCha20-Poly1305 (at ~250 MB/s). SHA-2 is faster than Blake since AArch64 includes instructions for SHA.

As expected, the chacha20-poly1305 scheme is by far the fastest in software[1]. AES-OCB is faster than GCM but doesn't quite gets "nearly as fast" as CBC.

A test on HW with AES-NI and CLMUL would be interesting to see how GCM and OCB compare there.

Update: Thomas' results show that OCB is a good bit faster on his modern Intel. On the i5-520M, which is a bit older (2010ish) OCB is more than twice as fast as GCM.

Update: Added results for a Haswell desktop CPU. The ratios almost exactly match Thomas' results as one would expect (both are Haswell).

Update: Added results for ARM Cortex-A53 (amlogic s905), AArch64

[1] but I still find it surprisingly fast even on the G5.

ThomasWaldmann commented 8 years ago

we don't need to compare gcm and cbc modes, cbc does not have auth, so the comparison would be gcm and cbc+auth (hmac or whatever).

i am a bit unsure about ocb. although the patent stuff seems unproblematic meanwhile, it hindered wide usage until recently, so one could suspect ocb is way less practically tested than gcm.

also, i am not convinced whether we should wait until openssl 1.1 is widely available and packaged. we could also go for libsodium, which already is available and packaged (but adds extra dependency).

ThomasWaldmann commented 8 years ago

i5-4200u with aes-ni, openssl 1.0.2:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           36164.96k    96595.22k   207142.23k   288822.61k   327669.08k
sha512           26040.90k   104421.67k   221632.00k   371365.89k   470551.21k
aes-256-cbc     386736.39k   410446.76k   416114.01k   417011.03k   417680.04k
aes-256-gcm     289575.60k   828874.30k  1474338.99k  1708658.35k  1785738.58k

openssl 1.1.0 git master:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           36976.78k    98577.73k   211887.62k   292908.18k   327972.18k   332016.30k
sha512           24729.87k   100302.29k   216779.00k   368207.19k   467856.04k   477997.74k
blake2s256       25418.42k   107899.43k   224348.93k   315508.10k   353039.58k   372823.38k
blake2b512       27473.39k   108173.14k   317090.85k   488826.20k   580064.60k   587563.01k

aes-256-cbc     386987.58k   410620.12k   416009.47k   418933.81k   417838.42k   412663.81k
aes-256-gcm     272809.14k   721350.98k  1485755.59k  1755807.06k  1979094.36k  2008274.26k
aes-256-ocb     258501.45k   993106.39k  1864350.38k  2342698.67k  2612682.75k  2639790.08k
chacha20-poly1305   190241.36k   332894.29k   709114.61k  1299729.07k  1386345.81k  1397286.70k
enkore commented 8 years ago

also, i am not convinced whether we should wait until openssl 1.1 is widely available and packaged. we could also go for libsodium, which already is available and packaged (but adds extra dependency).

I used OpenSSL here mainly because it's a convenient way to test it: While on x86 I don't expect performance differences between *ssl and NaCl/libsodium, re-testing should be done with the library actually used in the end to ensure it has the performance level we expect(ed).

ThomasWaldmann commented 8 years ago

Somehow embarrassing that we can encrypt+auth 4-8 times faster than compute any easily and separately available hash.

enkore commented 8 years ago

AES is cheating with it's dedicated per-round instructions :D Could use a hash/mac constructed from AES, but they all have many more caveats than typical MACs in my perception.

Another thing to consider is that more recent ARM chips also include acceleration for AES. Newer Raspis (at least the v3) are running on an A53 core that includes that.

enkore commented 8 years ago

I added another set of results above, for a 1st gen i5 (and also some for the previous Core2 processor). Generally in line with other observations, except...

Update: Thomas' results show that OCB is a good bit faster on his modern Intel. On the i5-520M, which is a bit older (2010ish) OCB is more than twice as fast as GCM.

ThomasWaldmann commented 8 years ago

Wow, that's a surprising result. It's just a pity that it likely will take quite some time until aes-ocb (openssl 1.1) is widely available and packaged - and by then many of these 1st gen Core-i machines might be gone anyway.

ThomasWaldmann commented 8 years ago

About AES-GCM, see Black Hat 2016, paper is public on iacr: "nonce disrespecting adversaries"

https://twitter.com/tqbf/status/760907360319660032

ThomasWaldmann commented 7 years ago

http://www.nongnu.org/lzip/xz_inadequate.html interesting article, includes "longer is not necessarily better" fact about CRCs/hashes.

enkore commented 7 years ago

While the analysis seems to be accurate for a compressed format, it doesn't really apply to Borg, since both the MAC and the repository-level CRC are done after the data may have been compressed. There is no equivalent to Pudd in Borg. The only kind of false positives we can get is where the checksum was corrupted but the data is actually not corrupted. Here the usual len(checksum) vs len(data) applies.

enkore commented 7 years ago

Intel Cannon Lake (due early 2018 or so) will reportedly support the Intel SHA-2 extensions. These instructions (already available on some lower end Atom chips) reportedly perform at about 4 cpb on these (I don't know if the CL implementation would be faster).

ThomasWaldmann commented 7 years ago

The AMD Ryzen 1800X cpu has this cpu flag: sha_ni : SHA1/SHA256 Instruction Extensions

ThomasWaldmann commented 7 years ago

keccak can be used for single-pass authenticated encryption.

rugk commented 7 years ago

:+1: for libsodium discussed at the beginning of this issue. It's really good (and modern) especially if you have no hardware AES support ChaCha20 e.g. is much faster.

ThomasWaldmann commented 7 years ago

http://preshing.com/20110504/hash-collision-probabilities/

zookozcash commented 7 years ago

The benchmarks from https://github.com/borgbackup/borg/issues/45#issuecomment-221259154 and https://github.com/borgbackup/borg/issues/45#issuecomment-221234832 are benchmarking the implementation of BLAKE2 in openssl, which is probably much slower than some other implementations (including, perhaps, future upgrades of openssl).

For machines with AVX2 like the Xeon E3, https://github.com/sneves/blake2-avx2 reports ~3 cycles per byte, which is about 1.25 times as fast as shown above. For the ARM Cortex-A53, https://bench.cr.yp.to/results-hash.html#aarch64-par3 shows ~6 cycles per byte, which is about 4.4 times as fast as shown above.

enkore commented 7 years ago

The BLAKE2 implementation in OpenSSL (and Python, Borg 1.1, libb2, most other stuff as well) is the reference implementation (with no vectorization whatsoever).

The other (SSE4, AVX, etc.) optimised implementations are not included in Borg (and most other packages mentioned above), because everything but AVX2 has practically no improvement in performance since Haswell. (It's a pretty darn good superscalar processor)

AVX2 is not shipped, because no one sat down yet and said "Yep, I declare this thing non-experimental" (all of them are marked experimental by their authors). The sole exception is libsodium which ships an experimental AVX2 implementation in the default configuration.

AVX2 is also not shipped in Borg because that's a huge PITA to do; see #2422 and the "crc32*" files here (esp. crc32_dispatch.c).

Ideally someone else (Python, OpenSSL) decides to ship a better implementation at some point and "someone" adds the necessary code to make Borg use OpenSSL's BLAKE2. Everyone wins. We might also use libsodium in the future. Though I don't know what to make of it yet.

ThomasWaldmann commented 7 years ago

another one against gcm:

http://www.metzdowd.com/pipermail/cryptography/2016-March/028824.html

guess we'll just use ocb and chacha/poly.

infectormp commented 6 years ago

t1ha - Fast Positive Hash. In most cases up to 15% faster than City, xxHash, mum-hash, metro-hash, etc.

infectormp commented 4 years ago

The BLAKE3 a cryptographic hash function - https://github.com/BLAKE3-team/BLAKE3

enkore commented 4 years ago

I reran the tests from above on a Ryzen 3900X using OpenSSL 1.1.1d.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256          106384.13k   345054.19k   897293.69k  1434363.86k  1788742.25k  1807924.24k
sha512           43884.78k   176089.53k   370019.94k   634628.74k   804669.85k   783004.49k
blake2s256       61339.27k   242532.95k   390655.26k   465763.91k   495717.82k   499056.64k
blake2b512       49562.98k   198390.17k   535434.49k   761940.14k   914828.61k   865731.49k

aes-256-cbc     767444.38k   929565.78k   982629.23k   977272.05k   992434.91k   993516.26k
aes-256-gcm     494790.72k  1351056.42k  2639749.24k  3451936.94k  3881137.88k  3858543.08k
aes-256-ocb     467931.89k  1693760.31k  3525061.12k  5028728.75k  5624994.67k  5851113.04k
aes-256-ctr     618497.82k  2140440.58k  5050578.48k  6892584.36k  7558430.72k  7638058.87k
chacha20-poly1305   297504.81k   541595.35k  1089618.16k  2081096.12k  2004806.25k  1997594.55k

Note massive increase in SHA-256 performance due to the Ryzen supporting Intel SHA Extensions (which only cover SHA-1 and SHA-256). Predictions from a few years ago when we added BLAKE2 were thus correct; better than SHA-256, except if your CPU has hardware support for it.

ThomasWaldmann commented 4 years ago

blake3 seems to have nice speed.

pkgconfig support still missing? library availability on linux (and other) dists? platform compatibility?

https://github.com/BLAKE3-team/BLAKE3

infectormp commented 4 years ago

Python bindings for BLAKE3 https://github.com/oconnor663/blake3-py

infectormp commented 3 years ago

High performance fork of zlib https://github.com/zlib-ng/zlib-ng/

deathtrip commented 3 years ago

So are there any plans of adding any of these algorithms in the near future?

ThomasWaldmann commented 3 years ago

@deathtrip some of the stuff we talked about is already done (like blake2b, openssl 1.1), some might come after 1.2, when we target crypto changes.

keep in mind that adding new code is always a burden on maintenance and also some risk (bugs?), so we try to find a good balance between pros and cons.

do you think something specific would be useful to add?

note: I added some updates (in bold) to some of the posts above and deleted a few now irrelevant ones.

deathtrip commented 3 years ago

do you think something specific would be useful to add?

I think blake3 and zlib-ng look like good candidates. They were the ones i thought about when i asked the question.

ThomasWaldmann commented 3 years ago

zlib

we use zlib via Python stdlib currently. also, i guess it is mostly for backwards compatibility with repos created with early borg and attic versions (when there was no lz4 and zstd yet) and users wanting something more advanced rather use lz4 and zstd?

so not sure what we would win by using zlib-ng, considering that we might have to bundle all of its code because it is not available on all platforms we support.

but in general (had a quick look), it looks like zlib-ng is a good project, so maybe some day python will switch from zlib to zlib-ng and we'll get it automagically.

blake3

C: no libs (like libb3?) for it yet, so we also would need to bundle all of its C code.

using blake3-py would be an option though (and we would not have to deal with the compiling / with rust).

can somebody measure how much faster b3 is (compared to b2) when used single-threaded?

enkore commented 3 years ago

Is there something zlib-ng is actually better at compared to zstd, which Borg already supports?

deathtrip commented 3 years ago

blake3

C: no libs (like libb3?) for it yet, so we also would need to bundle all of its C code.

Here are instructions on how to build the libraries. Seems that for x86 the assembly library could be used, as the C one doesn't support multithreading yet.

ThomasWaldmann commented 3 years ago

@deathtrip what i meant is lib availability on the usual OSes people use out there, didn't want to build/package the lib myself.

also, i'ld first like a perf comparison that is single-threaded to see how much b3 is faster than b2 *without going to multi-threading.

FabioPedretti commented 3 years ago

I think the performance graphs for BLAKE3 is single thread, see the title on the image: https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/037de38bfec4e813ab6189a50cb7c4cbae47268a/media/speed.svg

See also: https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf

ThomasWaldmann commented 3 years ago

PSA: execution times [s] for hashing 1GB of random bytes on Apple M1 (Macbook Air 2020, Python 3.8.9) :

sha256 2.95
sha512 1.80
blake2b 1.06 (fastest)
blake2s 1.75
% file `which python`
...env/bin/python: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
...env/bin/python (for architecture x86_64):    Mach-O 64-bit executable x86_64
...env/bin/python (for architecture arm64): Mach-O 64-bit executable arm64

Note:

ThomasWaldmann commented 3 years ago

LibreSSL 2.8.3 an macOS 12.0b4 Apple M1 cpu

type                  16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256               67477.74k   154138.47k   262916.43k   319208.81k   340417.22k
sha512               57932.16k   231396.08k   350646.38k   481570.49k   546502.26k
aes-256-cbc         228120.47k   237817.67k   234978.42k   239252.78k   240820.57k
aes-256-gcm         142843.72k   145801.57k   143040.33k   142580.08k   142298.70k
aes-256-ctr         251498.14k   268501.38k   270107.85k   271982.03k   273654.51k
chacha20 poly1305    43378.82k   169592.00k   287290.88k   345544.80k   369688.69k

Note: no aes-ocb, no blake2b.

Maryse47 commented 3 years ago

why sha256 is slower (significantly) than sha512?