elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.13k forks source link

Enable ingesting/monitoring errors when Kibana "did not load properly" #184981

Open jloleysens opened 3 months ago

jloleysens commented 3 months ago

Kibana can fail to initialise on the client. The flow is:

  1. Client requests and loads our entrypoint HTML and bootstrap<-anonymous>.js files (i.e. successfully contacts Kibana at origin)
  2. At this point on the client Kibana has not started and bootstrap logic will attempt to download 100s of other assets as individual network requests (the waterfall starts)
  3. Something goes wrong with loading any one of these assets
  4. The user sees an error message stating: "Elastic did not load properly. Check the server output for information." (code)
  5. This is the end of the road, because Kibana did not init there was no error reported. This is particularly problematic for supporting our serverless (managed) offering.
  6. If this is an intermittent issue, the user can "fix" by just refreshing page

It's possible there are other failure scenarios. Like a failure to even load bootstrap.js. So we likely need to embed something in our initial HTML that can watch and report for these kinds of errors pre/during bootstrap time.

Out of scope

elasticmachine commented 3 months ago

Pinging @elastic/kibana-core (Team:Core)

pgayvallet commented 3 months ago

For the record:

this is where the "Kibana failed to load" message is defined:

https://github.com/elastic/kibana/blob/3e44cca7e74e0b59f1afedcabb445b937e1bf730/packages/core/rendering/core-rendering-server-internal/src/views/template.tsx#L81-L87

and this is what triggers its display:

https://github.com/elastic/kibana/blob/8a298e4c8d1a9c75ce081e5c4c91174818e057d9/packages/core/rendering/core-rendering-server-internal/src/bootstrap/render_template.ts#L67-L80

pgayvallet commented 3 months ago

So we likely need to embed something in our initial HTML that can watch and report for these kinds of errors pre/during bootstrap time

Yeah, I think the main question here is how we want to report those.

As shown in my previous comment, those failures are happening/caught extremely early in the bootstrap process (well technically, it's even before what we call "bootstrap"), and in practice Core is not loaded yet, APM is not loaded yet, nothing is.

The two options I can think of are:

  1. Given we do have the injected metadata in the document already, we could potentially parse them to retrieve RUM's config and manually instantiate it to send the error (what if RUM is disabled though - and do we care?).

  2. Another option could be to perform an HTTP requests against a dedicated Kibana endpoint to show the failure in the server logs. Note, however, that it would require the endpoint to not require authentication, so we would need to be extremely careful on which information we send from the browser and log on the server (to avoid it being a perfect attack vector)

  3. Can anyone think of any alternative?

Also FWIW, atm the whole renderTemplate generated code is a string template, making it very tedious and error prone to modify that part of the code (no linting, no code check, no nothing). So if we were to evolve this part of our code, we should probably find a way to have it more integrated with code quality tools (moving away from string templating ideally)

afharo commented 2 months ago

3. Can anyone think of any alternative?

Push it to our telemetry endpoints? TBH, I'd rather use APM for this (it feels like a better fit for that product) but those are public unauthenticated endpoints available to us.

jloleysens commented 2 months ago

Yeah I like the idea of surfacing these errors in APM. I was imagining we do some kind of very simple (best effort) fetch call that can only run before bootstrap is done.

So I guess the tricky/nasty parts are:

pgayvallet commented 2 months ago

I was imagining we do some kind of very simple (best effort) fetch call that can only run before bootstrap is done.

Oh yeah... I overlooked than APM wouldn't be loaded so that we have to manually forge the request against the right endpoint. This is even scarier...

So I guess the tricky/nasty parts are:

Yeah, your list seems accurate.

Overall, as initially mentionned, this seems like a lot of effort and complexity for questionnable gains / upsides. But if we want to do it, your approach seems the right one to me.

afharo commented 2 months ago

I was imagining we do some kind of very simple (best effort) fetch call that can only run before bootstrap is done.

Oh yeah... I overlooked than APM wouldn't be loaded so that we have to manually forge the request against the right endpoint. This is even scarier...

Should we load APM from a CDN? (the old-but-trustworthy route vs. the "new hipster" bundling strategy) 😜 https://www.elastic.co/guide/en/apm/agent/rum-js/current/install-the-agent.html#_synchronous_blocking_pattern

pgayvallet commented 2 months ago

🙈