elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.63k stars 8.22k forks source link

[Infra UI / Hosts] Reduce the number of options for Hosts limit button group #161170

Closed formgeist closed 1 year ago

formgeist commented 1 year ago

Kibana version:

main / 8.10.0

Description:

Reduce the current number of options in the host limit:

Current Options

CleanShot 2023-07-04 at 13 36 38@2x

Solution Reduce to '50', '100' and '500' (where '100' is the default)

elasticmachine commented 1 year ago

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

formgeist commented 1 year ago

@roshan-elastic Curious to hear you thoughts on this?

roshan-elastic commented 1 year ago

Hey @formgeist - thanks for raising this.

I don't have a strong preference here - we weren't sure how many options they might need so I'm sure this can be optimised.

I've checked the data and we don't have much to understand what users are doing but we do plan to introduce an 'all' option:

Doing this will add one more so we'll probably need to either consolidate these or change to a drop-down if we need all of them.

smith commented 1 year ago

Or we could make 100 or 500 the default and remove this control. I'm not convinced it gives us, the user, or the user's cluster any benefit.

roshan-elastic commented 1 year ago

I'm not convinced it gives us, the user, or the user's cluster any benefit.

Do you mean having a host-limit at all or do you mean allowing more/less options?

I think we could potentially remove some of the granularity here but something which has been mentioned a few times by users is the need to 'break the glass' (i.e. give users control over what is returned).

The 'all' option has tested well so this is in the backlog so if we do this, we can probably remove some of the granularity we have.

I'll see if I can get some data to help make a decision on the options.

smith commented 1 year ago

Do you mean having a host-limit at all or do you mean allowing more/less options?

I mean being able to control the limit at all. I'm ok with the Rows per page control at the bottom with regular items.

This is what APM does:

https://github.com/elastic/kibana/blob/6b65e909356a8f9e9f29db28ac8c41edae9959e1/x-pack/plugins/apm/server/routes/services/get_services/get_services_items.ts#L23

No user option. 1000.

I'm wondering why we're hearing people feel they need to "break the glass". That seems like we aren't guiding them toward the power of search and filtering.

Discover bangs you over the head with it if you go up to 500:

CleanShot 2023-07-06 at 22 32 02@2x

I love it. It could be improved by autofocusing the search bar, making the message large red blinking at the top, and lowering your credit score one point. Also discover is showing documents returned from a search, while we're (as of right now) doing aggregations, which take longer.

So that's just my opinion on the way we do limits as a whole. We have an open issue for the "All" option and we can figure out what to do when we get to that.

The issue description says we should reduce the number of options or turn it into a select. My vote would be to reduce the number of options, ideally to zero but anything would be fine.

roshan-elastic commented 1 year ago

You make a good point @smith. I completely agree that we want to get to a point where users can get to the things they want using the power of search - we'll win by leveraging what we're good at...i.e. search.

In short, I don't think we have a user-friendly way for users to find what they need by filtering alone (especially with metrics) because of the amount of knowledge of Elastic-syntax/ECS to do something simple like finding the top most CPU-bound hosts. My hope is that users 'will' filter down to smaller than the host limit (and then will sort) but if a user just wants to see the most CPU hosts across 10,000 hosts - they can't do this without filtering (which I think is very difficult for some things that you would think are easy).

For example, to get the most CPU-bound hosts, you would need to open the tooltip for 'CPU Usage', check the formula and then somehow add the formula(average(system.cpu.user.pct) + average(system.cpu.system.pct)) / max(system.cpu.cores) > X% to universal search to filter for the hosts which are CPU Bound. Even then, I'm not sure how you would filter for the top X CPU bound hosts...can you return a 'top X' or do you have to guess the % that the top ones are?

I believe that our best option for the moment is to apply a host limit (to make this work for enterprise - we know inventory doesn't work for large workloads) but provide a break-the-glass 'all' option so that users who need to 'sort' to find the hosts they're looking for are blocked from doing that (this option tested well with the users we spoke with).

Medium/long-term though, I'd love to hear suggestions on how we can offer easy-to-use filtering to help users who don't understand (or want to understand) Elastic concepts/ECS can feel confident they're finding the hosts which need attention (e.g. see the most CPU-bound hosts).

My hope was that we could write some kind of values against host (like 'health') that users can easily filter by with one click but I don't have any good ideas right now...

The issue description says we should reduce the number of options or turn it into a select. My vote would be to reduce the number of options, ideally to zero but anything would be fine.

I think the issue is more about UX, ensuring we show the right UX for the feature - I don't think it's asking to remove the feature altogether.

For Hosts, we need more user feedback so I'd really like to get the 'all' option added - probably remove some granularity and hear from some users what they say to help us understand their preferences a bit more.

roshan-elastic commented 1 year ago

I've just dug into the telemetry in this and got some data:

Note : 100 is the default limit Image

Given that only 10% of queries use a smaller than the default limit, I'd say the '10', '20' and '50' limits aren't needed - people seem happy with the default.

Additionally, only 2.66% of queries are using the 500 limit. However, given conversations with users about lacking the ability to 'break the glass' - I still think we need an 'All' option (esp for larger enterprise).

I'd propose we have two options:

100 and All (where 'All' has a tooltip).

@formgeist @kkurstak - UX-wise, do you think it's easier just to have a 'Yes' or 'No' option for a host limit or should we show '100' and 'All'?

The considerations here are:

kkurstak commented 1 year ago

@roshan-elastic I understand the 100 / All [i] option, it makes sense - great to have supporting starts on this. Im not sure I understand the "Yes/No" option - could you explain again or point me to the right comment?

roshan-elastic commented 1 year ago

@kkurstak as discussed, let's not do 'yes/no' - we can do '100|all'

neptunian commented 1 year ago

I commented in https://github.com/elastic/obs-infraobs-team/issues/1064 about the "all" option and why I don't think it's a good idea.

kkurstak commented 1 year ago

@neptunian thanks for pointing to that comment. Id like to understand if we have any other options here -

Are we wanting to add this mainly to fix the sorting issue where they can properly sort all their hosts by metric? Not sure this would be the best way to go about it.

How could we deal with the sorting problem then? What was apparent from the conversations held with users in the last months was that it was really difficult for them to understand what's being listed in the hosts table. It also made little sense to show 100 "random" hosts and sort them by, for instance, CPU. What was really needed was to show the 100 hosts with the highest CPU of all. How could we achieve that without the "All" option?

If the UI doesn't really support "all" from both a performance and UX perspective, I'm not sure we should allow it. Also I think we would need to use our "old" composite API request to support this, so it would increase maintainability on our end if we need to have two different requests here.

What does this mean? Would it be an expensive to maintain page? Would it be slow? Would it be difficult to manage in the future?

The assumption is that users will stick mainly to the "100" option. I think enabling them the "All" option, with all its problems (that we partially explain with the tooltip I guess), might give us the change to see if people want to work on their entire infrastructure or not which is a great benefit on its own - I think this was already mentioned. Intuitively, creating a visualisation of the users' infrastructure without enabling the full picture feels incomplete.

roshan-elastic commented 1 year ago

Thanks for your thoughts on this @kkurstak - I was thinking the same so I did some digging. I got some pretty useful stats from our telemetry across all users (I've cc'd you in the comment).

In short, around 1% of all searches return hosts > 500 (our current limit). I agree it's an issue but looking at the frequency, it looks like we can best focus on other things.

roshan-elastic commented 1 year ago

@formgeist - updated this issue to have '50', '100' and '500'

jennypavlova commented 1 year ago

@formgeist @kkurstak Is that the correct new default view of the limit or you want to change the UI as well?:

Screenshot 2023-08-02 at 17 53 53

@roshan-elastic Should we think about a case when an old URL is used with a limit set to 10 or 20 and reset it to 100 in this case or we should just set the new limits as it's not likely to have this case?

roshan-elastic commented 1 year ago

Hey @jennypavlova, good thinking - I hadn't considered this.

I'm OK with no host limits working aside from those that we support in the UI (i.e. 50/100/500) so if we needed to force an old query-string with a limit of '20' to '100 ' then I'm OK with that. Most users seem to leave it as 100 anyhow:

Last 7 days

image
jennypavlova commented 1 year ago

@roshan-elastic Thanks for the reply, then maybe forcing it is not needed if 10 (or 20) is selected - we will still return the old results and the user can change that in the menu ( the user will only see the new options without the previous selection and can always pick one of the new options if more hosts are needed ) - and this will happen only for the usage of old urls so with the new ones 100 will be by default. 👍

roshan-elastic commented 1 year ago

@roshan-elastic Thanks for the reply, then maybe forcing it is not needed if 10 (or 20) is selected - we will still return the old results and the user can change that in the menu ( the user will only see the new options without the previous selection and can always pick one of the new options if more hosts are needed ) - and this will happen only for the usage of old urls so with the new ones 100 will be by default. 👍

Sounds good @jennypavlova - thanks for thinking this through!