envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.75k stars 4.75k forks source link

possible optimization: repetitive private key reads #20045

Open howardjohn opened 2 years ago

howardjohn commented 2 years ago

Title: possible optimization: repetitive private key reads

Description: When investigating https://github.com/envoyproxy/envoy/issues/19774, I found another area that may be improved (possibly). Currently, for each cluster/listener, we do a full boringssl processing of the same key/cert pair (I think). This ends up being fairly expensive at scale. With a large number of clusters, startup time is decreased from 5.5s to 4.5s in my tests when using 2048 bit RSA keys vs ECDSA keys (since they are cheaper to process).

RSA: 2022-18-02_09-39-12

ECDSA: 2022-18-02_09-39-03

I know very little about boringssl or the lifecycles here, but my naive thought is that it could be read once and shared among each cluster

cc @lambdai

daixiang0 commented 2 years ago

Sounds possible if a cluster use only one Envoy, then all repetitive keys in the Envoy config could be read once.

BTW, is RSA still widely used even its poor performance?

lambdai commented 2 years ago

While a SSL_CTX is relative hard to share across clusters, my hunch is that private key and tls certificate are supposed to immutable and shared.

There are definitely lots of issues need to be resolved if we'd like to share cert chain among various TlsContext 'd like to hear some high level feedback from @PiotrSikora @lizan if cert_chain (aka X509) is shareable.

PiotrSikora commented 2 years ago

This can be definitely done. Most of the work is migration from X509 to BoringSSL's CRYPTO_BUFFER (already done in https://github.com/envoyproxy/envoy/pull/10621, but that PR never landed) and then enabling CRYPTO_BUFFER_POOL.

lambdai commented 2 years ago

Thank you @PiotrSikora

Though the goal is slightly different, #10621 is a great source to understand the underlying load/verification.

By reading it, I can confirm lots of TODOs if I want to go with the shared X509, such as the same underlying cert chain bytes can be valid in ssl_ctx_A but invalid in ssl_ctx_B.

I don't understand if CRYPTO_BUFFER can save CPU cycles or memory footprint. If there are, what are they?

PiotrSikora commented 2 years ago

Though the goal is slightly different, #10621 is a great source to understand the underlying load/verification.

The stated goal is different, but 90%+ of the code is the same.

I don't understand if CRYPTO_BUFFER can save CPU cycles or memory footprint. If there are, what are they?

I suspect BoringSSL doesn't process new certificates if they already exist in the CRYPTO_BUFFER_POOL, so most likely both CPU and memory, but you'd have to verify that yourself.