elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.83k stars 8.21k forks source link

Support for serving static assets over a CDN #72880

Closed tylersmalley closed 3 months ago

tylersmalley commented 4 years ago

With the change to including the build number in the asset path, we can now support serving these assets through a CDN. This would drastically decrease the noise and load on the Kibana server, while hopefully improving performance.

This should be something we can launch with on Cloud.

elasticmachine commented 4 years ago

Pinging @elastic/kibana-operations (Team:Operations)

joshdover commented 4 years ago

Another thought on this: adding CDN support would allow us to take advantage of HTTP/2 for loading assets without requiring the Kibana server to support it or for the customer to configure HTTPS (which is required by most browsers to use HTTP/2). Loading assets is the area where we believe HTTP/2 would help us the most and most CDN services support HTTPS and HTTP/2 out of the box and this could be one of the quickest paths to get there.

In addition, having a CDN should allow us to side-step the issue where browsers will not cache assets from an origin using a self-signed certificate. If the assets are loaded from an origin using a cert from a trusted CA, the browser should cache these assets.

If we explore this path, I think it would be beneficial to also explore the possibility of having a pre-configured CDN that Elastic hosts and controls for Basic+ users. We would need to explore the long-term costs of this (which may be hard to estimate!), but a fast-by-default experience sounds enticing. For air-gapped installations, we will need to support a fall-back to the existing mechanism. If there is a demand for it, we can also explore allowing air-gapped installations configure their own CDN.

ppisljar commented 3 years ago

another usecase for this is reporting, currently we generate report (csv/pdf) and store it back into elasticsearch, which is very inefficent and also bring limits on the size of report we can generate. Having a CDN where we could put the reports would be great.

elasticmachine commented 2 years ago

Pinging @elastic/kibana-core (Team:Core)

tylersmalley commented 2 years ago

Realized this was tagged Ops. I think it makes the most sense for Core to implement (correct me if I'm wrong). We can own making the release manager changes to get the assets deployed to a bucket to be used on Cloud.

vadimkibana commented 2 years ago

I see impact:low label, IMO, I would assign the highest impact to this.

mshustov commented 2 years ago

I see impact:low label, IMO, I would assign the highest impact to this.

impact:low description is Long-term priority, unless it's a quick fix. The highest priorities are for tasks planned for the current or the following release. We don't plan this work yet.

lizozom commented 2 years ago

I do agree with @vadimkibana that the impact:low priority is misleading. But I guess that now as we're organizing it, we'll be able to prioritize this inside the performance project.

vadimkibana commented 2 years ago

impact:low description is Long-term priority, unless it's a quick fix.

I'm not sure "impact" should be implied from priority or from the timing for when the task is scheduled.

Even if this task is not planned for in the near future or at all, IMO, it is still very high impact.

The description of the label is probably a bit misleading, we might want to change it.

mshustov commented 2 years ago

Even if this task is not planned for in the near future or at all, IMO, it is still very high impact.

@vadimkibana @lizozom Do we have any numbers to justify the impact: high? We can put triage needed label, but since we don't have any plans to work on the loading assets performance, we won't triage it any time soon

vadimkibana commented 2 years ago

@vadimkibana @lizozom Do we have any numbers to justify the impact: high?

@mshustov I'm not sure it is something that needs numbers to justify it. Serving static immutable files from application server is just not something you do. IMO, you always want to have static files in a blob storage and be served efficiently through a CDN.

Kibana is a product that can be self-managed, so it made sense initially for Kibana to server the files (as there is no other way), but with the cloud-first approach, I don't see a reason why in Elastic Cloud the static Kibana files would not be served from a CDN, especially if they are immutable (the same for all deployments).

Some justification numbers:

vadimkibana commented 2 years ago

Screenshot from current v7.17 Elastic cloud deployment of what is downloaded on initial page load for future reference:

image

Screenshot 2022-01-11 at 17 40 30

image

stacey-gammon commented 2 years ago

re-added ops team since @tylersmalley mentioned them helping out with the deployment strategy side.

lizozom commented 2 years ago

We now have some production stats from APM about page load times (uncached sessions only) by geo, to help us justify the importance of this change: Page load time is up to x2 for Europe, Asia and Oceania compared to the US.

(all charts are obviously in MS and not in seconds)

USA vs. everywhere else:

image

Breakdown by continent:

image

Breakdown by country:

image

mshustov commented 2 years ago

We now have some production stats from APM about page load times (uncached sessions only) by geo,

@lizozom What do uncached sessions mean? Do we exclude all the cases when artifacts are loaded from the browser cache? If so, can we have full stats, including such users with cached assets?

lizozom commented 2 years ago

Cached means that static assets were loaded from either from browser cache or disk cache. Uncached means that static assets were loaded from network.

Adding uncached didn't change the results much (there were just a few of them, specificly for the deployments we're monitoring): image

pgayvallet commented 2 years ago

I'm sorry but I have to ask: what is expected from the Kibana side on this matter exactly?

I'm no CDN expert, but for having used some of them on quite a few projects, CDN/Asset caching can usually be addressed without architecture changes, as it's mostly infrastructure based with caching policies/rules at the CDN's level (well at least for self-populating CDNs, but I doubt any CDN solution is not self-populating in 2022).

As already said in the description, all our bundles' paths are prefixed with the build hash (and now plugin version), making it really easy to introduce a caching policy on any CDN that Cloud may want to use for basically /*/bundles/**.js

Other static assets, like fonts, aren't prefixed (yet) with that, but we can also imagine a per-Kibana-version caching strategy, with each Kibana deployment on Cloud using the correct CDN configuration depending on their version.

So, I apologize in advance if the question is stupid but, what exactly is expected from us here? It looks like a Cloud's problematic to me, and I'm not sure how we're supposed to be the main actor here? Are we supposed to investigate which CDN solution Cloud may want to use?

pgayvallet commented 2 years ago

After taking a quick look of the 'static' assets we're currently serving from Kibana:

What are our served assets

javascripts bundles

This part is kinda ready:

We may eventually want a /bundles/${buildNum} prefix instead of the current ${buildNum}/bundles to ease path identification, but apart from that, we're all set.

bootstrap script

This one has (historically) been served from /bootstrap.js. Note that this script is stateful as dynamically generated from the server at access time, and requires to access some SOs (on behalf of the user) to retrieve the theme-related uiSettings. For this reason, it is using an ETAG based caching. It can't/shouldn't probably be handled by a potential CDN.

core static assets

We have two static folders exposed from core, containing css and font files.

https://github.com/elastic/kibana/blob/01ff431a4e0607313863fdd815b961b2785acb84/src/core/server/core_app/core_app.ts#L199-L204

These are not currently using a buildNum prefix, but to have tested it, modifying the code to do so shouldn't be an issue.

Note that atm, the registerStaticDir core API delegates the serving to @hapi/inert, which use ETAG caching for the static files. If we change these assets' path to be buildNum-prefixed, we gonna need to also adapt registerStaticDir to opt-out of ETAG, as we'll now be assured that an asset served from an unique path will never change (at least in production, as we're currently doing for js bundles).

plugins assets

Plugins can define a ${pluginPath}/public/assets folder, that will then be accessible via the /plugins/${pluginId}/assets path of the server.

These ones are the most problematic, because these paths are directly used from the plugin's code, e.g

https://github.com/elastic/kibana/blob/f80104df0e10d8a322a2ac8fef38c02e48f1798a/src/plugins/home/public/application/components/solutions_section/solution_panel.tsx#L26-L27

so changing the path to support a buildNum based prefix would require to change all access to these files from the code.

If we want to do so, we should probably expose a utility function from core, something like pluginAsset(path, plugin=currentPlugin) that would properly use the correct path prefix (and would avoid manually changing all the paths to plugin assets if we want later to change it)

Note: I couldn't remember what system is used to expose those /plugins/${pluginId}/assets folder on the server. I don't think we're doing that in the core_app service. Is that related to the optimizer somehow @spalger?

Is there more?

Did I miss anything in that list?

What are we expecting from Kibana on this matter?

This is what I don't get exactly from this issue's initial description.

My first question is: are we focusing on Cloud here, or is the purpose of this issue to allow the usage of a CDN also for our on-premise customers? Depending on the question, the approaches can be sensibly different.

  1. Documentation on which assets can be cached

Most modern CDNs are self-populating. Should we, once we've made the adjustment to have most of our assets served by a unique buildNum-based path, just document which paths are safe to be allowlisted in a CDN system, and maybe provide example rules for the most commons of them?

  1. Active assets extraction

Are we expecting an active assets extraction, that will generate the exact folder of assets that can be pushed to/served by a CDN? In that case, do we also want to support the 'old' CDN approach, by allowing customers to change the paths of the assets that are accessed from the client-side of Kibana? (e.g allowing to replace urls like http://kibana/bundles/42/core/core.js to http://my.cdn.org/bundles/42/core/core.js)

@stacey-gammon (and other contributors of this issue) maybe you have more context on what exactly would be expected from us here?

mfinkle commented 2 years ago

I'd like to come up with a plan to leverage a CDN for cache-ready assets in Cloud. That would be our primary driver. If that process also yields some CDN-ready documentation and capabilities for self-managed, that would be nice.

@pgayvallet your comment above is a wonderful beginning to seriously thinking about how to get Kibana, running on Cloud, CDN-ready. Great content! @stacey-gammon and I can start reaching out to other teams to figure out what approach we'd like to take for utilizing a CDN.

lizozom commented 2 years ago

Thanks @pgayvallet. Good to see that we're getting close to supporting CDN asset serving (on cloud and potentially on prem).

I wonder, given the current state of things in Kibana, if we should get someone from the cloud\cloud infra teams involved. They would probably be able to shed some light on what the challenges from their side are.

@mfinkle Any clue who the appropriate teams \ people would be?

mfinkle commented 2 years ago

Just an update: We have a good current state and direction forward (thanks again @pgayvallet). We've reached out to the folks in SRE that work with Elastic's CDN to establish a connection. We have not prioritized this work yet. As we get more of our clusters collecting APM data, we can create a cost benefit basis to justify creating a real plan and priority.

For now, I don't think we are ready to put the work in a project and on a roadmap.

lukeelmers commented 1 year ago

@jloleysens has a PR up adding some initial support for configuring Kibana to request static assets from another domain: https://github.com/elastic/kibana/pull/169408

jbudz commented 9 months ago

The CDN is ready for testing - see https://github.com/elastic/kibana/pull/173159#issuecomment-1877239437 for details.

tylersmalley commented 9 months ago

Yeah @jbudz!

pgayvallet commented 3 months ago

@jloleysens should we consider this as completed?

jloleysens commented 3 months ago

Even though there are refinements and improvements planned (like rolling out to Cloud) I believe we have achieved the original intent of this issue.