Investigate Server --> Client communication channel support

mshustov commented 3 years ago

Update 23 July 2023

We seem to have consensus that SSE is the easiest to deploy in our users environment and provide enough power for our use cases. The biggest risk is that several plugins open a SSE connection simultaneously leaving only a few connections for other API requests. With the push to remove bfetch we'll be encouranging more users to switch to http2 but adoption might take time.

If we have a need to maintain several SSE connections in the short to medium term we might need to "multiplex" SSE "streams" over a single SSE connection and expose this as a core service. The purpose of this issue is to document and align around a short to medium term plan.

There is a growing number of cases when the Kibana server wants to inform the browser part about an event that occurred in the system. Since the Kibana server doesn't provide this functionality out-of-the-box, Kibana plugins have to work around this limitation by patterns like long-polling, manual request/response batching (bfetch plugin)

There are at least two potential candidates to implement server-client communication:

WebSockets
Server-Sent Events

We should evaluate risks before introducing one in the Core:

proxy and load balancing support
- ideally, no additional setup required for an intermediate proxy
- proxy doesn't break communication
number of supported parallel connections (might be blocked by lack of http2 support)
changes required to the Kibana Security model
changes required to the Kibana authentication model

cc @streamich @lizozom

elasticmachine commented 3 years ago

Pinging @elastic/kibana-core (Team:Core)

joshdover commented 3 years ago

There is a growing number of cases when the Kibana server wants to inform the browser part about an event that occurred in the system.

Any related issues or known use cases we can link to here?

lizozom commented 3 years ago

@joshdover here's one

mshustov commented 3 years ago

@joshdover upcoming Notification service, licensing plugin to notify the client-side about license status update IIRC the Core team faced this problem while working on SO tagging or global search. @pgayvallet do you remember the use case?

joshdover commented 3 years ago

I think we need to be quite careful about introducing a new networking protocol into Kibana. Our customers deploy Kibana behind a number of different proxies and other systems and not all are configured to support HTTP/2 and/or WebSockets at this time.

HTTP/2

One major hurdle to introducing HTTP/2 support is the requirement to use TLS. Though not actually required by the HTTP/2 spec, all major browser vendors only allow HTTP/2 connections over TLS.

I suspect that the interactive setup mode project (https://github.com/elastic/kibana/issues/89287) may move us closer to being able to require TLS, however we'd still need a long grace period before we could require that all customers enable TLS. We also don't have a fool-proof way to detect how many customers are using TLS since termination could be happening at the load balancer, rather than at Kibana itself.

The connection limit problem really becomes an issue for users who have multiple Kibana tabs open since this cap is enforced across all tabs. It may be interesting to see if we can workaround this issue with a SharedWorker that uses a single dedicated connection shared across multiple tabs, using SSE under-the-hood. It definitely feels like we're trying to implement HTTP/2 over HTTP/1.1 though and I'm not optimistic it will work out. For example, workers must copy all data that is passed to windows or other workers which may be non-trivial overhead.

I think we really need to consider leveraging HTTP/2 so that mechanisms like bfetch aren't necessary anymore. It may mean a less optimal experience for customers without HTTP/2 support in their stack, but they should be able to fallback gracefully. We may even be able to detect this client side and use bfetch as a fallback during the transition period to requiring TLS. We can then start to notify them in the UI when they're using HTTP/1 and start pushing them to reconfigure their stack to support HTTP/2 for performance improvements.

For general performance, my vote would be to start supporting HTTP/2 before exploring other, more specialized approaches like WebSockets. Long-polling (or even just regular polling) would in theory be much less expensive and more performant due to using a long-lived TCP connection that is already up to full-speed. Header compression also helps.

I think trying to exhaust our options with HTTP/2 (and optionally, SSEs) would be wise before we look at WebSockets. It's a much more widely supported technology, has a built-in fallback to HTTP/1.1, and requires much less developer education to adopt and leverage. HTTP/2 would help Kibana's client-side performance across a wide range of touch points in the product, not least of which being initial page load time.

It's important that we continue to consider how to accommodate our users' deployment environments, but we've also seen that customers who frequently update the Elastic Stack are also more likely to be willing and able to upgrade related systems like load balancers and proxies. Typically, a customers who do not upgrade the Stack frequently are the same ones using older proxy configurations that do not support HTTP/2. HTTP/2 is now 6 years old and widely supported.

The primary hurdle remaining is the TLS requirement, but I think we can document and notify our users to guide them towards a more performance Kibana (all while increasing the security of their Stack).

legrego commented 3 years ago

I suspect that the interactive setup mode project (#89287) may move us closer to being able to require TLS, however we'd still need a long grace period before we could require that all customers enable TLS.

++ interactive setup mode is a step in the right direction, but our initial scope of work excludes TLS setup for Kibana's web server. Once we have a setup mode, it'll be less work to add TLS, but the primary reason we removed it from the initial scope because of browser trust: we either have to somehow provision certificates that all browsers will trust out-of-the-box (Let's encrypt is not a silver bullet), -or- we teach our users to ignore browser security warnings when we present an untrusted certificate (😬)

We also don't have a fool-proof way to detect how many customers are using TLS since termination could be happening at the load balancer, rather than at Kibana itself.

This should be fairly easy to do with client-side telemetry, if that's a route we want to explore. We can't capture telemetry on older versions, but it would give is more than we have today

joshdover commented 3 years ago

This should be fairly easy to do with client-side telemetry, if that's a route we want to explore. We can't capture telemetry on older versions, but it would give is more than we have today

Great point, I've opened an issue: https://github.com/elastic/kibana/issues/99229

pgayvallet commented 3 years ago

but they should be able to fallback gracefully. We may even be able to detect this client side and use bfetch as a fallback during the transition period to requiring TLS

Imho the solution should to be have bfetch switch its transport implementation depending on the current capabilities. It currently only supports one transport, let's call it chunked-content. When we'll support HTTP2, and if the instance's configuration / infra supports it, it should uses SSE instead, and fallback to the current chunked-content otherwise. That way, consumers of the bfetch plugin don't have to care about these implementation details.

pgayvallet commented 4 months ago

http2 support has been added, and we know this is the direction we want to go for SSE, so I'll consider the investigations done and close this.

pgayvallet commented 4 months ago

(@afharo you were right in the end!) Closed too soon - we will use this for our experimentations around SSE

tsullivan commented 4 months ago

There is a growing number of cases when the Kibana server wants to inform the browser part about an event that occurred in the system. Since the Kibana server doesn't provide this functionality out-of-the-box, Kibana plugins have to work around this limitation by patterns like long-polling, manual request/response batching (bfetch plugin)

I opened a new issue to brain-dump and discuss why I think that long-polling and manual request/response batching will likely continue to be the best strategy for keeping application state in Kibana up-to-date: https://github.com/elastic/kibana/issues/189131. Basically, Elasticsearch doesn't support an event stream that subscribers can listen to (yet). That means we have to have polling happening somewhere, and it's probably least complext for that polling to happen in the browser client.

elastic / kibana

Investigate Server --> Client communication channel support #98881

HTTP/2