elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.6k stars 8.21k forks source link

Improve the status page UI to help with troubleshooting #107873

Closed afharo closed 2 years ago

afharo commented 3 years ago

As the number of plugins in Kibana keeps increasing, the current status page is growing very large, making it harder to troubleshoot what may be causing a Yellow/Red status in Kibana.

As an example, see the following use case:

Kibana is in Red status. However, in the initial list of plugins, they all seem fine. image

Scrolling down, we find out a bunch of red plugins (but also mixed with some greens): image

The first one, plugin:actions@8.0.0, claims [3] services are unavailable. See the status page for more information. Well, that's not very helpful: I'm already at the status page and I can't see more information about why this plugin is not working.

However, I continue reading:

  1. alerting has a very similar message. Still not helpful.
  2. canvas: The message says [reporting]: [2] services are unavailable. See the status page for more information. Ok... Does canvas depend on reporting, and reporting is out of order? But still, the end message does not help.
  3. Thankfully enough, cloud shows [security]: [taskManager]: Task Manager is unavailable See the status page for more information: cloud depends on security, which depends on taskManager. So taskManager is off.
  4. Finally, I can now look at taskManager and attempt to fix the issue.

👆 To replicate this scenario:

  1. Start an ES cluster
  2. Start Kibana
  3. Stop and delete the ES cluster
  4. Start a new ES cluster with the same connection endpoints
  5. Check Kibana's status

I think that we could explore introducing a better UX. Some suggestions:

What do you think?

elasticmachine commented 3 years ago

Pinging @elastic/kibana-design (Team:Kibana-Design)

elasticmachine commented 3 years ago

Pinging @elastic/kibana-core (Team:Core)

joshdover commented 3 years ago

This is all great feedback and I totally agree we need to expose this data in a more useful way.

Highlight issues by sorting by status

Yeah we should just do this, maybe even at the API level?

Leverage the new output format from GET /api/status

+1. Probably the simplest change we could do is to leverage the "expanding rows" functionality of EuiBasicTable to expose the detailed information & documentation links.

Focus on the issues

This one sounds like a larger lift and may not be strictly necessary if we do the first two items.

afharo commented 3 years ago

Highlight issues by sorting by status

Yeah we should just do this, maybe even at the API level?

We currently return an object, and each key is the service. I don't think that we can ensure that the keys are sorted, can we?

For the rest of @joshdover's comments: I totally agree!

ryankeairns commented 3 years ago

+1 to the sorting and expandable rows.

lukeelmers commented 3 years ago

For now we decided we will keep this issue scoped to the two main areas identified in the description:

If we determine a larger redesign of the status page is needed in the future, we will treat that as a separate effort.

pgayvallet commented 2 years ago

Sorting plugins by status so we can more easily surface problematic ones

This one obviously makes sense

Grouping plugins by status in an expandable row so there isn't an overwhelming number of plugins in the list

After playing a bit with this, I'm not 100% sure this is a correct design. Once we sort plugins by status by having the red->yellow->green order, I feel this is unnecessary, as the user already have quick access to the failing services, and as it forces additional clicks to retrieve the information (e.g which plugins are red).

I feel like expandable row should be used on a per-plugin basis, to show additional information, such as the whole ServiceStatus.meta content.

We could add a summary line on top of the table with X plugins are red, Y are yellow, Z are green to add an additional quick visual indication to the user.

Also, given that the /status endpoint's http code depends only on core status, I think it would make sense to more core services statuses into their own section, instead of putting them in this plugins table.

I will take a quick shot at it.