cisco / libsrtp

Library for SRTP (Secure Realtime Transport Protocol)
Other
1.19k stars 469 forks source link

Performance issue with OpennSSL 3 and libsrtp #645

Open murillo128 opened 1 year ago

murillo128 commented 1 year ago

I think it is an OpenSSL issue, but crossposting it here:

https://github.com/openssl/openssl/issues/20625

murillo128 commented 1 year ago

I am building libsrtp using nodejs bundled OpenSSL and I have seen a performance performance issue with node 18 & 19 which bundles OpenSSL 3.0 both on AES GCM and AES CTR modes compared to node 16 which bundles OpenSSL 1.1 using the following benchmark:

https://gist.github.com/Sean-Der/7a42bd70edfe1324ccc6ab399d653c0e

The performance hit is quite significative, with AES GCM being more than 2x slower and AES CTR being 1.5x slower (depending on the server types I have tested with):

Server 1:

node v16.19.0 - openssl 1.1.1s+quic
gcm test done in 847746ms
ctr test done in 2209247ms

node v18.15.0 - openssl 3.0.8+quic
gcm test done in 2042822
ctr test done in 2837856
Server 2:

node v16.19.0  - openssl 1.1.1s+quic
gcm test done in 2210976
ctr test done in 3035863

node v18.15.0  - openssl 3.0.8+quic
gcm test done in 4259098
ctr test done in 4081153
pabuhler commented 1 year ago

Hi, I will try to reproduce with your test and see. It looks like it is encrypting a very small packet so the idea is mostly to measure the overhead of setting up encryption, rather than actually performing encryption, does that sound correct?

murillo128 commented 1 year ago

i can try later with bigger payloads

murillo128 commented 1 year ago

Tried with different payload sizes and still seeing the performance regression on OpenSSL 3

Node 16
0    payload size gcm done in 886714
0    payload size ctr done in 2421151
400  payload size gcm done in 1494584
400  payload size ctr done in 4381984
1200 payload size gcm done in 2326866
1200 payload size ctr done in 8568198

Node 18
0    payload size gcm done in 2007877
0    payload size ctr done in 3287084
400  payload size gcm done in 2705423
400  payload size ctr done in 5485942
1200 payload size gcm done in 3549461
1200 payload size ctr done in 9669171
murillo128 commented 1 year ago

Seems it is highly unlikely that openssl performance issues are solved anytime soon. Would you be prone to consider a patch supporting asm gcm functions directly? (like porting the code from go: https://go.dev/src/crypto/aes/)

pabuhler commented 1 year ago

@murillo128 do you mean as "built in support" like the existing AES implementation. In general I would prefer not to have to maintain and support such code but it is possible. Would need to talk with others. OpenSSL is not the only crypto back end we support, have you looked at the alternatives? You could also provide you own backend. There is also the possibility of replacing the current implementation at run time, see https://github.com/cisco/libsrtp/blob/fab73a32d27160e14662c23823d231e245b543e9/crypto/include/cipher.h#L241 , I am not sure if any one actually uses but I think it works.

murillo128 commented 1 year ago

I agree that maintaining the code is a burden, that's why i preferred to be in libsrtp itself hehe..

the srtp_replace_cipher_type seems very helpful and would allow me to implement the hw optimized gcm version for the chipsets i care to optimize, but keeping openssl backend for everything else. Thank you for the tip!

mildsunrise commented 10 months ago

Follow up on this, in case it's useful to someone: we ended up extracting code from OpenSSL into a custom AES-GCM backend for libsrtp.

Our conclusion is that OpenSSL is designed for lengthy encryption ops, and not just with regards to the whole OSSL_PARAM API introduced in v3. The particular asm backend we were interested in was also made substantially slower for small messages in order to reduce its state size, so it could fit in the existing GCM128_CONTEXT structure. We reverted this tradeoff before generating the assembly to extract into our codebase.

We considered upstreaming the backend, but given it only works for AVX-512 capable processors we decided not to.

pabuhler commented 10 months ago

Hi, yes AVX-512 only is probably too specific. I guess this task needs some follow up to see if there have been any recent improvements in OpenSSL. Will leave the issue open and hopefully loop back around soon and rerun some test.