Open tvernum opened 6 years ago
Pinging @elastic/es-security
Long running processes that use a shared client (such as X-Pack monitoring's HTTP exporter) tend not to cause this problem because they will utilize HTTP-keep-alive and/or SSL session resumption to minimize the number of SSL sessions in use.
This may not be entirely true when you have many clients (say mobile devices) connecting and when they are doing SSL session resumption concurrently, it can cause performance bottlenecks. As for SSL session resumption, it tries to locate the earlier session from the cache during client hello and the way cache has been implemented causes this performance bottleneck. For reference: JDK-8202086 and source where it accesses to locate session.
Once we expose these settings it's up to the deployment to configure them appropriately.
Why do we need settings for this? Isn’t it enough to add sane default values to jvm.options? I also favor this because what do we do if a user sets these via system properties and we’ve also exposed settings?
We don't have to expose it as configuration, but I think there's some value in it:
SSLSessionContext
, so I think it's possible to set different configurations per xpack context. That is the cache for transport could be sized relative to your node count, and http could be sized differently. I don't know that we want to do that - maybe we do, it seems redundant to have a very large cache for transport, but simplicity might win./follow
We used to have shield.ssl.session.cache_size
and shield.ssl.session.cache_timeout
settings but this also caused problems and was ultimately removed for 5.0. From the breaking changes docs:
shield.ssl.session.cache_size
:: The value for the SSL session cache size should be configured through the use of the system propertyjavax.net.ssl.sessionCacheSize
.shield.ssl.session.cache_timeout
:: The SSL session cache timeout being set incorrectly can negatively impact performance and was not widely used, so it has been removed and the default value of the JDK is used.
Given that they have caused problems in the past, I would make sure we consider this when making our decision.
As far as I can tell (but I haven't done any direct testing yet) there's a cache per SSLSessionContext, so I think it's possible to set different configurations per xpack context.
There is a cache per context, but I think it would be a very advanced configuration to start sizing each cache independently.
I also favor this because what do we do if a user sets these via system properties and we’ve also exposed settings?
If we do add these settings back, the order of defining the value should be (IMO):
In terms of what I think we should do, I suggest that we start by adding a sane default in jvm.options
and see how that works out. It is easier to add things later than take them away.
We discussed this in the security team meeting and agree to proceed by adding a sensible default for cache size into jvm options.
Pinging @elastic/es-security (Team:Security)
javax.net.ssl.SSLSessionContext
has two methods to control caching:setSessionTimeout
setSessionCacheSize
In
sun.security.ssl.SSLSessionContextImpl
defaults are:SoftReference
)In theory the use of soft references should limit the memory impacts of this cache. In practice it has at least 3 signficant problems
expungeExpiredEntries
/emptyQueue
methods have to walk the whole cache and clear (probably) every entry.These problems are more likely to occur if an ES server is subjected to many, standalone SSL sessions, such as cyclic monitoring with short lived processes (e.g. using curl with cron).
Long running processes that use a shared client (such as X-Pack monitoring's HTTP exporter) tend not to cause this problem because they will utilise http-keep-alive and/or SSL session resumption to minimise the number of SSL sessions in use.
X-Pack should not rely on the JVM defaults for these cache settings as they have proven to be problematic in some cases.
I propose: