elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.64k stars 8.23k forks source link

[Metrics UI] /metadata should throw error if no features are found but data exists #161384

Open miltonhultgren opened 1 year ago

miltonhultgren commented 1 year ago

Based on a recent support escalation, we found a situation where a customer had been dropping a key field (event.dataset) which the Metrics UI uses to decide which metrics to show in the Node Details page, causing the UI to simply break but without any errors.

The flow looks like this:

  1. UI resolves source config, and checks if there is any data
  2. Assuming there is data, it proceeds to ask the /metadata endpoint for which features are "active" (for what are we collecting metrics data)
  3. The list of features is a terms aggregation over event.dataset to inform us of which metricsets are being collected and thus which metrics we should have available
  4. The list of features is used in the follow up request when asking for the specific metrics to be aggregated to be displayed on the page

In the case where event.dataset is absent, no features are reported (though in this case all the metric fields still existed). /metadata returns a 200 OK with an empty features list, which is translated to a request for no metrics, which also returns a 200 OK and the UI is simply empty.

I can see two ways to solve this, re-work the feature check to search for individual metric fields instead of the metricset. This is probably a lot of work, and might break some of the domain logic the Metrics UI tries to encapsulate. The other way is to treat no features as an error (assuming there is data), which should be a much smaller effort.

AC

elasticmachine commented 1 year ago

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

smith commented 1 year ago

If Elastic were to formulate a policy around this my suggestion would be, "Elastic Observability does not under any circumstances support dropping the event.dataset field from events."

I added this to an internal epic where we're researching how best to respond to missing data, since we should do something other than displaying a blank screen.

The flexibility of Elasticsearch makes using it a hostile environment (by that I mean we can't rely on any field really being there, even if it's in the mapping, etc.) and we have to have defensive code that responds gracefully to unforeseen circumstances. A blank screen is not acceptable but "some of these documents have no event.dataset and this UI is not going to work" is fine.

elasticmachine commented 1 year ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

botelastic[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.