Closed tonyghiani closed 1 year ago
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)
@neptunian Talking with @miltonhultgren about giving better feedback to the user when setting the index pattern in the metrics settings, we thought it could be useful to tell them the current match of the typed wildcard. I think with these enhancements we can help users to understand why the metric pages don't show data and easily identify what they need to update in order to make it work.
Any opinion or suggestion about the current definition?
Sounds good, though since the Infra UI doesn't support Data Views yet we'd have to just test the index pattern is valid. It's valid for the user to enter an index pattern and no such matching Data View exists. We could potentially wait until we support Data Views to make this improvement, but instead of using callouts, I think we'd have a similar experience to Logs which is to select a Data View that already exists or create one in Management, else use the existing index pattern option. In that case we'd up back where we are now testing to make sure that index pattern is valid and having these nice callouts. Will let @roshan-elastic decide on the priority, as https://github.com/elastic/kibana/pull/152840 might be good for now given other priorities.
Sounds good, though since the Infra UI doesn't support Data Views yet we'd have to just test the index pattern is valid. It's valid for the user to enter an index pattern and no such matching Data View exists. We could potentially wait until we support Data Views to make this improvement, but instead of using callouts, I think we'd have a similar experience to Logs which is to select a Data View that already exists or create one in Management, else use the existing index pattern option. In that case we'd up back where we are now testing to make sure that index pattern is valid and having these nice callouts. Will let @roshan-elastic decide on the priority, as #152840 might be good for now given other priorities.
LGTM, I'd definitely prefer to first standardise the settings and allow the Data View selection as happening in Logs settings, so that we can then bring this enhancement to both settings pages 👍
Hey @tonyghiani @neptunian - I like this issue.
My thoughts at the moment:
Hey @tonyghiani @neptunian - I like this issue.
My thoughts at the moment:
- Love the idea of giving feedback on matching sources in here...sounds like we'd need to support data views to allow for that?
Could wait for that but could also check the index pattern itself is valid (at least one index exists and no remote failures).
- I have no idea on how many users get impacted by this current lack of feedback to help prioritise fixing this without data views being available. I have some thoughts/Qs...
How many users use CCS in this app might be a helpful indicator
- Do we have telemetry to this to track how many users are saving invalid patterns? I don't think we do...
Not that I'm aware of
- If so, is it more work to add in this telemetry than just actually fixing the issue (i.e. adding in UI to tell the user that it doesn't match anything...do they want to save)?
We don't really handle the cases properly for the user at the moment which you would need to do to handle the case for correct telemetry data. So I guess the answer would be that improving the UX (the fix) would be faster.
Cheers @neptunian - that helps.
Do we have any idea of how much effort this might be? If it's something really quick and can be picked up easily then it could probably go higher up the backlog.
Let me ask @lucabelluccini if he's heard any feedback on this.
@lucabelluccini - Wondering if you've heard of any support tickets complaining about invalid index patterns being accepted (leading to no data showing in Infra UI, e.g. inventory)?
Trying to figure out where this should be prioritised (e.g. is anyone actually being affected)?
Thank you for reaching out.
In general, users using "Elastic defaults" should find their data. Users using CCS will for sure edit the settings in order to search on remote clusters.
This problem CAN be common to all the Kibana Apps (APM, Metrics, Logs, Uptime, etc...). It would be great to find a common path.
If we force users to use already existing Data Views
:
Data View
covering metricbeat-*,metrics-*
which users can edit (possible since 8.3)Data Views
on upgrade, as we always allowed "raw" index patternsI think for a good UX:
Settings
I've seen users wondering why they do not see the data.
We assume everyone is aware of our standard naming (metrics-
in the Datastream world and metricbeat-
for the "previous" world).
For a non-expert Kibana user, it might be tricky to know they have to go in Settings
.
Note: even a data views can have 0 indices. You create it when the index / data stream exists, you delete the index. The Data View exists but has no backing index.
Let's assume a user installed Metricbeat 8.5 properly, with all the Index Templates. All good.
Let's assume the user forgets to install the index templates in Metricbeat 8.6. The data will be indexed, visible in Discover, but they'll see nothing in the Metrics UI.
Easy to simulate:
DELETE metricbeat-broken
PUT metricbeat-broken
{
"mappings": {
"properties": {
"host.hostname": {
"type": "text"
},
"host.name": {
"type": "text"
},
"event.dataset": {
"type": "text"
},
"@timestamp": {
"type": "date"
}
}
}
}
POST metricbeat-broken/_doc/1
{
"host.hostname": "hello",
"host.name": "hello",
"event.dataset": "hello",
"@timestamp": "2023-03-10T16:30:40.604Z"
}
The hello
host will never show up`.
While we're NOW resilient to this (in the "past" we were showing an empty page IIRC), but it would be nice to warn the user there have been some failures in the query as the data they show might be incomplete but they're not aware of it!
While on the browser we do not see any error in the response of the API /api/metrics/snapshot
, behind the scenes Kibana server probably got:
{
"took": 40,
"timed_out": false,
"_shards": {
"total": 22,
"successful": 21,
"skipped": 0,
"failed": 1,
"failures": [
{
"shard": 0,
"index": "metricbeat-broken",
"node": "oRaZjI9WQKSURzBAmr54KA",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [host.hostname] in [metricbeat-broken]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [host.hostname] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
]
}, ...
For troubleshooting purposes it wouldn't be bad to show a toast message with a simple message, but with the option to copy the actual error or at least, return the error off-band from the API (it's in the response payload but it's not shown). If we return the error off-band, if a support ticket is raised, the support can ask for the HAR file and would spot the error.
The self-diagnostic checks the "important fields" necessary for the correct behavior of the app have the correct types (host.hostname
, etc... must be aggregatable
).
Example https://github.com/elastic/kibana/issues/144268 (I think we should look into notifying the user the fields must have the correct type, not adapt the application to sub-optimal types and make them aware "easily")
Please keep in mind some users can configure CCS with skip_unavailable: true
. It would be nice to behave like in the case (2): warn the user the data might be temporarily incomplete.
I'll reach out @roshan-elastic for some additional information.
Thanks for this @lucabelluccini - very insightful!
@neptunian - Based on the above, I don't think we have a great understanding of how often add in invalid/non-matching index patterns so I think we'll have to assume that it's an edge case.
We should resolve this though but unless this is really quick to resolve, I think we should roll this into the wider work of supporting data views rather than prioritise it above other things.
Any thoughts?
Update: We've agreed with @tonyghiani about the designs and the required features, this is ready to go.
Update: designs reviewed by the UX writing team, copy updated.
Hey @tonyghiani @kkurstak - this looks great!
@tonyghiani Do you have a sense of how much effort this to do?
I'm supportive of this work - it's good UX and makes sense. What I'm not 100% sure on is how highly this can be prioritised given the above comment (in short, not many users complaining and more effort than it is worth to implement telemetry on it).
If this is relatively issue, we can prioritise it higher. If it's a big chunk of work, it'll have to go further down unless we know it's got a higher user impact.
Hey @tonyghiani @kkurstak - this looks great!
@tonyghiani Do you have a sense of how much effort this to do?
I'm supportive of this work - it's good UX and makes sense. What I'm not 100% sure on is how highly this can be prioritised given the above comment (in short, not many users complaining and more effort than it is worth to implement telemetry on it).
If this is relatively issue, we can prioritise it higher. If it's a big chunk of work, it'll have to go further down unless we know it's got a higher user impact.
Hey Roshan, it should be relatively quick since the API is already prepared to tell us why the failure is caused, so it should only be required to update the visual representation of the issue.
Cool @tonyghiani - looks like this is one of the few in the ready column and only one thing above it.
Feel free to pick it up after this one:
📓 Summary
While working on a fix for [Metrics UI] Improve error handling for missing remotes#144882, we realized users are able to set any value on the index pattern field for the metrics settings.
As long as they can still continue setting the wildcard they wish, it would be valuable to give them feedback about the current matches on the type wildcard, similar to how it is done while creating a new Data View in the Stack Management settings. Also, in case they still want to set an index pattern that has no matching Data View, showing a callout message that informs about this current status can clarify the current settings status.
✔️ Acceptance criteria
🎨 Figma designs
https://www.figma.com/file/ihPlxAqIhilt65347eUDYm/Infra-Obs---Tasks?node-id=18%3A174332&t=Xk9grR7MphZHC71G-1
Notes
Copy pending approval from UX writing team✅