Closed raycoll closed 1 year ago
Default stitched AES128-SHA-CBC:
% ./openssl speed -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 16 size blocks: 35580002 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 64 size blocks: 11911816 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 256 size blocks: 4391700 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 1024 size blocks: 1310784 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 174034 aes-128-cbc-hmac-sha1's in 2.99s
OpenSSL 1.0.1t 3 May 2016
built on: Tue May 31 21:56:56 2016
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: x86_64-unknown-linux-gnu-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DSSL_FORBID_ENULL -DDEVRANDOM="/dev/urandom" -DOPENSSL_NO_SRP -DOPENSSL_NO_SCTP -DOPENSSL_NO_DTLS1 -DOPENSSL_NO_SRTP -DOPENSSL_NO_CAST -DOPENSSL_NO_HEARTBEATS -DOPENSSL_NO_JPAKE -DOPENSSL_NO_GOST -DOPENSSL_NO_KRB5 -DOPENSSL_NO_GMP -DOPENSSL_NO_BUF_FREELISTS -Wchar-subscripts -Wcomment -Wformat -Winit-self -Wmain -Wmissing-braces -Wno-pragmas -Wparentheses -Wreturn-type -Wsequence-point -Wstrict-aliasing -Wswitch -Wtrigraphs -Wuninitialized -Wunknown-pragmas -Wunused-label -Wunused-variable -Wunused-value -Wpointer-sign -Wimplicit -pthread -fdiagnostics-color=auto -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -O3 -mfpmath=sse -march=core2 -g -fPIC -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc-hmac-sha1 190394.66k 254118.74k 376011.77k 447414.27k 476818.24k
Multibuffer:
% ./openssl speed -mb -evp aes-128-cbc-hmac-sha1
./openssl: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
s3-gamma-s3ws-j4-r2-62019.pdx2% export LD_LIBRARY_PATH=/home/soco/libcrypto-root/lib
s3-gamma-s3ws-j4-r2-62019.pdx2% ./openssl speed -mb -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 181172 evp's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 16384 size blocks: 117396 evp's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 32768 size blocks: 68677 evp's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 65536 size blocks: 37280 evp's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 131072 size blocks: 19418 evp's in 2.99s
The 'numbers' are in 1000s of bytes per second processed.
type 8192 bytes 16384 bytes 32768 bytes 65536 bytes 131072 bytes
aes-128-cbc-hmac-sha1 496374.92k 641138.69k 752644.79k 814394.03k 851222.77k
%
Speedup happens at larger block sizes. Caveat is openssl speed didn't run aes-128-cbc-hmac-sha1 with larger block sizes, but the assumption is multiblock scales better as block size increases.
AES-GCM is more favored than AES-CBC. Therefore, we won't invest in improving the CBC mode. Won't implement.
Intel reported up to 200% performance increase when using the "Multi-buffer" speed enhancements for AES-SHA-CBC in Openssl [1]. We could consider adding a "multi_buffer_encrypt" function to our s2n_cipher struct for S2N_CBC/S2N_COMPOSITE.
We'd need logic that:
Resources: [1] https://software.intel.com/en-us/articles/performance-of-multibuffer-aes-cbc-on-intel-xeon-processors-e5-v3