elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.25k stars 24.86k forks source link

Add configuration for SSL Session cache #32762

Open tvernum opened 6 years ago

tvernum commented 6 years ago

javax.net.ssl.SSLSessionContext has two methods to control caching:

In sun.security.ssl.SSLSessionContextImpl defaults are:

In theory the use of soft references should limit the memory impacts of this cache. In practice it has at least 3 signficant problems

  1. it can trigger frequent GC pauses.
  2. it suffers from terrible performance behaviours immediately after a GC, when the very large cache is suddenly filled with collected objects, and the expungeExpiredEntries/emptyQueue methods have to walk the whole cache and clear (probably) every entry.
  3. the SoftReferences are all equal value candidates for GC regardless of whether they were 1s old or 23.5h old, so if a GC is triggered all SSL sessions are likely to be expunged, even if though we know that some sessions are more valuable (more recent) than others.

These problems are more likely to occur if an ES server is subjected to many, standalone SSL sessions, such as cyclic monitoring with short lived processes (e.g. using curl with cron).

Long running processes that use a shared client (such as X-Pack monitoring's HTTP exporter) tend not to cause this problem because they will utilise http-keep-alive and/or SSL session resumption to minimise the number of SSL sessions in use.

X-Pack should not rely on the JVM defaults for these cache settings as they have proven to be problematic in some cases.

I propose:

elasticmachine commented 6 years ago

Pinging @elastic/es-security

bizybot commented 6 years ago

Long running processes that use a shared client (such as X-Pack monitoring's HTTP exporter) tend not to cause this problem because they will utilize HTTP-keep-alive and/or SSL session resumption to minimize the number of SSL sessions in use.

This may not be entirely true when you have many clients (say mobile devices) connecting and when they are doing SSL session resumption concurrently, it can cause performance bottlenecks. As for SSL session resumption, it tries to locate the earlier session from the cache during client hello and the way cache has been implemented causes this performance bottleneck. For reference: JDK-8202086 and source where it accesses to locate session.

Once we expose these settings it's up to the deployment to configure them appropriately.

jasontedor commented 6 years ago

Why do we need settings for this? Isn’t it enough to add sane default values to jvm.options? I also favor this because what do we do if a user sets these via system properties and we’ve also exposed settings?

tvernum commented 6 years ago

We don't have to expose it as configuration, but I think there's some value in it:

  1. There isn't a system property for the cache expiry ("timeout"), so we need to configure it in code, or we don't configure it at all. And if we don't expose a configuration then it's not possible to change it from whatever default we decide on.
  2. As far as I can tell (but I haven't done any direct testing yet) there's a cache per SSLSessionContext, so I think it's possible to set different configurations per xpack context. That is the cache for transport could be sized relative to your node count, and http could be sized differently. I don't know that we want to do that - maybe we do, it seems redundant to have a very large cache for transport, but simplicity might win.
kovyrin commented 6 years ago

/follow

jaymode commented 6 years ago

We used to have shield.ssl.session.cache_size and shield.ssl.session.cache_timeout settings but this also caused problems and was ultimately removed for 5.0. From the breaking changes docs:

shield.ssl.session.cache_size:: The value for the SSL session cache size should be configured through the use of the system property javax.net.ssl.sessionCacheSize. shield.ssl.session.cache_timeout:: The SSL session cache timeout being set incorrectly can negatively impact performance and was not widely used, so it has been removed and the default value of the JDK is used.

Given that they have caused problems in the past, I would make sure we consider this when making our decision.

As far as I can tell (but I haven't done any direct testing yet) there's a cache per SSLSessionContext, so I think it's possible to set different configurations per xpack context.

There is a cache per context, but I think it would be a very advanced configuration to start sizing each cache independently.

I also favor this because what do we do if a user sets these via system properties and we’ve also exposed settings?

If we do add these settings back, the order of defining the value should be (IMO):

  1. setting value
  2. system prop
  3. default value that we choose

In terms of what I think we should do, I suggest that we start by adding a sane default in jvm.options and see how that works out. It is easier to add things later than take them away.

tvernum commented 5 years ago

We discussed this in the security team meeting and agree to proceed by adding a sensible default for cache size into jvm options.

elasticsearchmachine commented 10 months ago

Pinging @elastic/es-security (Team:Security)