aiidateam / disk-objectstore

An implementation of an efficient "object store" (actually, a key-value store) writing files on disk and not requiring a running server
https://disk-objectstore.readthedocs.io
MIT License
15 stars 8 forks source link

Add support for multiple hashing algorithms? #82

Closed giovannipizzi closed 3 years ago

giovannipizzi commented 4 years ago

This should be easy, we need to decide if it's needed.

To compare speed of different hash algorithms use e.g. the openssl cmdline utility:

openssl speed md5 sha1 sha256

On my Mac:

Doing md5 for 3s on 16 size blocks: 10452227 md5's in 2.98s
Doing md5 for 3s on 64 size blocks: 7714910 md5's in 2.97s
Doing md5 for 3s on 256 size blocks: 4228075 md5's in 2.98s
Doing md5 for 3s on 1024 size blocks: 1494308 md5's in 2.97s
Doing md5 for 3s on 8192 size blocks: 212259 md5's in 2.96s
Doing sha1 for 3s on 16 size blocks: 11914722 sha1's in 2.96s
Doing sha1 for 3s on 64 size blocks: 7667551 sha1's in 2.78s
Doing sha1 for 3s on 256 size blocks: 4077740 sha1's in 2.86s
Doing sha1 for 3s on 1024 size blocks: 1599355 sha1's in 2.95s
Doing sha1 for 3s on 8192 size blocks: 235227 sha1's in 2.95s
Doing sha256 for 3s on 16 size blocks: 10179907 sha256's in 2.96s
Doing sha256 for 3s on 64 size blocks: 5348647 sha256's in 2.97s
Doing sha256 for 3s on 256 size blocks: 2290028 sha256's in 2.96s
Doing sha256 for 3s on 1024 size blocks: 708176 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 95262 sha256's in 2.98s
LibreSSL 2.6.5
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: information not available
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              56119.34k   166247.22k   363217.18k   515209.22k   587441.12k
sha1             64403.90k   176519.16k   365000.50k   555165.94k   653213.42k
sha256           55026.52k   115257.04k   198056.48k   242532.52k   261874.60k
giovannipizzi commented 4 years ago

SHA1 seems more or less as fast as MD5, and ~2-3x faster than SHA1. For this reason (and to have at least two algorithms, also needed in tests) I am adding support to SHA1 as well in 2d87757.

But for performance reasons I think one should strive to always use the same (default SHA256) as during import/export it's more efficient if both containers use the same hash type.