aws / s2n-tls

An implementation of the TLS/SSL protocols
https://aws.github.io/s2n-tls/usage-guide/
Apache License 2.0
4.52k stars 705 forks source link

Add support for AES-CBC Multi-buffer #272

Closed raycoll closed 1 year ago

raycoll commented 8 years ago

Intel reported up to 200% performance increase when using the "Multi-buffer" speed enhancements for AES-SHA-CBC in Openssl [1]. We could consider adding a "multi_buffer_encrypt" function to our s2n_cipher struct for S2N_CBC/S2N_COMPOSITE.

We'd need logic that:

  1. Determines if multiblock is available on the system/libcrypto
  2. Determines if the input is eligible for multiblock. It may not be worth applying this to smaller input buffers.

Resources: [1] https://software.intel.com/en-us/articles/performance-of-multibuffer-aes-cbc-on-intel-xeon-processors-e5-v3

raycoll commented 8 years ago

Default stitched AES128-SHA-CBC:

% ./openssl speed -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 16 size blocks: 35580002 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 64 size blocks: 11911816 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 256 size blocks: 4391700 aes-128-cbc-hmac-sha1's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 1024 size blocks: 1310784 aes-128-cbc-hmac-sha1's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 174034 aes-128-cbc-hmac-sha1's in 2.99s
OpenSSL 1.0.1t  3 May 2016
built on: Tue May 31 21:56:56 2016
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: x86_64-unknown-linux-gnu-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DSSL_FORBID_ENULL -DDEVRANDOM="/dev/urandom" -DOPENSSL_NO_SRP -DOPENSSL_NO_SCTP -DOPENSSL_NO_DTLS1 -DOPENSSL_NO_SRTP -DOPENSSL_NO_CAST -DOPENSSL_NO_HEARTBEATS -DOPENSSL_NO_JPAKE -DOPENSSL_NO_GOST -DOPENSSL_NO_KRB5 -DOPENSSL_NO_GMP -DOPENSSL_NO_BUF_FREELISTS -Wchar-subscripts -Wcomment -Wformat -Winit-self -Wmain -Wmissing-braces -Wno-pragmas -Wparentheses -Wreturn-type -Wsequence-point -Wstrict-aliasing -Wswitch -Wtrigraphs -Wuninitialized -Wunknown-pragmas -Wunused-label -Wunused-variable -Wunused-value -Wpointer-sign -Wimplicit -pthread -fdiagnostics-color=auto -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -O3 -mfpmath=sse -march=core2 -g -fPIC -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc-hmac-sha1   190394.66k   254118.74k   376011.77k   447414.27k   476818.24k

Multibuffer:

% ./openssl speed  -mb -evp aes-128-cbc-hmac-sha1
./openssl: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
s3-gamma-s3ws-j4-r2-62019.pdx2% export LD_LIBRARY_PATH=/home/soco/libcrypto-root/lib
s3-gamma-s3ws-j4-r2-62019.pdx2% ./openssl speed  -mb -evp aes-128-cbc-hmac-sha1
Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 181172 evp's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 16384 size blocks: 117396 evp's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 32768 size blocks: 68677 evp's in 2.99s
Doing aes-128-cbc-hmac-sha1 for 3s on 65536 size blocks: 37280 evp's in 3.00s
Doing aes-128-cbc-hmac-sha1 for 3s on 131072 size blocks: 19418 evp's in 2.99s
The 'numbers' are in 1000s of bytes per second processed.
type                       8192 bytes  16384 bytes  32768 bytes  65536 bytes 131072 bytes
aes-128-cbc-hmac-sha1      496374.92k   641138.69k   752644.79k   814394.03k   851222.77k
%

Speedup happens at larger block sizes. Caveat is openssl speed didn't run aes-128-cbc-hmac-sha1 with larger block sizes, but the assumption is multiblock scales better as block size increases.

zaherd commented 1 year ago

AES-GCM is more favored than AES-CBC. Therefore, we won't invest in improving the CBC mode. Won't implement.