Closed jasonrhodes closed 2 years ago
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)
Do you imagine that this component will self manage the data fetching? (which should be fine, given this is not to be shared cross domain and is likely simpler/less configurable than t-grid)
Do you imagine that this component will self manage the data fetching? (which should be fine, given this is not to be shared cross domain and is likely simpler/less configurable than t-grid)
It might still pay off to write the table itself as a component that only takes data, loading states and loading callbacks and keep the data fetching separate in one or more hooks. This doesn't mean we can't also provide an all-in-one component for simple use, but makes the parts more extensible.
+1 on what @weltenwort describes -- if that warrants two tickets, please split them!
I see three parts to this:
One way of breaking it down and parallelizing it could be to start solving the dependency problem and create mockups for the UI components in parallel. And once that is done go on to the implementation steps of both the components and the APIs.
I've created #117344 to track the circular dependency avoidance investigation. It's mainly a placeholder for now until I manage to write down some specifics.
Do we want to make the remainder of this smaller by narrowing it to the "metrics-in-apm" case for now? And can we clarify some details like
metrics-*
indices or will the APM server put them somewhere into special APM indices?@formgeist @katefarrar @alex-fedotyev @danielkhan Very thoughtful questions about the scope of what we need to accoomplish in this.
I will take a stab at responding to these questions here and y'all can review and provide input on the suggestions.
I've created #117344 to track the circular dependency avoidance investigation. It's mainly a placeholder for now until I manage to write down some specifics.
Do we want to make the remainder of this smaller by narrowing it to the "metrics-in-apm" case for now? And can we clarify some details like
* Where to the data come from? Is the user expected to ingest them in parallel into `metrics-*` indices or will the APM server put them somewhere into special APM indices?
This is ingested in metrics-* datastreams via agent and not by APM server.
* Which node types do we want to support initially and eventually?
pardon my ignorance, but what is a node type in this context? If by node_type you meant, VM host, K8s pod or docker/CRI container then it is all of them, right from the beginning. If node_type is something else, then please let me know what that means.
* What metrics exactly should these columns display?
My first instinct is that we can begin with just embedding the tabular view of hosts, pods, containers as is.. that is, APM will pass the list of hosts, containers and pods to the inventory view embeddable and the view will be filtered for those values in the APM->infrastructure tabs respectively. Once we get past that then we can add additional metrics.
* How to handle the case when only some metrics are available? * Are the node ids clickable? What happens if the user clicks one of them?
The embeddable should bring the entire experience of inventory view to APM>infrastructure tab.. that is, we show the list of hosts and then users can click on one and it will show the enhanced host details..
Additional questions that are worth looking into are:
Thanks for clarifying some of the points.
If by node_type you meant, VM host, K8s pod or docker/CRI container then it is all of them, right from the beginning.
yes, it's host
, container
or pod
My first instinct is that we can begin with just embedding the tabular view of hosts, pods, containers as is.. that is, APM will pass the list of hosts, containers and pods to the inventory view embeddable and the view will be filtered for those values in the APM->infrastructure tabs respectively.
so it would only contain the name column and no metrics initially?
The embeddable should bring the entire experience of inventory view to APM>infrastructure tab.. that is, we show the list of hosts and then users can click on one and it will show the enhanced host details..
That's good to keep in mind as a goal, but can you imagine an acceptable smaller step that we can take first? It would help if we could come up with a sequence of additions that can be tackled incrementally.
so it would only contain the name column and no metrics initially?
That is my take but I'd like to align with Alex, Casper, Daniel and Kate before we make a final call whether MVP should contain it or not. Alex, Casper have done lot of prior thinking on it so I'd like to make sure we make the right UX call here in alignment with APM.
That's good to keep in mind as a goal, but can you imagine an acceptable smaller step that we can take first? It would help if we could come up with a sequence of additions that can be tackled incrementally.
Definitely worth exploring the incremental and yet acceptable steps we could take to ship something sooner. I am operating under assumption that the enhanced host details flyout would just work independent of which kibana UI it is used in. If that isn't the case then we'd need smaller incremental step, like show the tabular inventory view but to see enhanced host details, you are linked to inventory page. Not a great UX but a step in the right direction.
Happy to hear if you have thoughts on additional incremental steps we could take.
I am operating under assumption that the enhanced host details fly-out would just work independent of which kibana UI it is used in.
It could certainly be made to work, but not with zero effort. Here are some variations I could think of that have different complexities:
I'm not too familiar with the metrics UI code, so there might be additional options.
default
source configuration (which means the user is responsible for setting up the metrics ui to show the ingested metrics).@formgeist do you have the recording from our meeting end of last week?
@formgeist do you have the recording from our meeting end of last week?
@jasonrhodes I've sent to you in DM. It's also in the meeting invite description 👍
I just talked to @katefarrar and here is what we think the MVP requirements are:
Note: It's okay to split these two bullets into 2 separate tasks, (1) create the component with a storybook UI (like we have for the log stream), and (2) embed in APM.
Stretch goal:
[1] Metrics UI node detail page
[2] Metrics explorer using information we have in the planned shared component
View all in Metrics Explorer [2]. We already have all of the following information for a given instance of this shared
@jasonrhodes Do you mean to add a general link from each table list to view all the instances in one single Metrics Explorer view?
@formgeist yeah, something like that. I'm not sure how we'd position or design that link exactly, but the Metrics Explorer would be able to show exactly what the table shows, but in graph format (basically "sparklines" but full graphs and not sparklines), so it feels like a wasted opportunity to not link to it. But I don't want that to block the rest. Thoughts?
View all in Metrics Explorer [2]. We already have all of the following information for a given instance of this shared component
@jasonrhodes @katefarrar and I discussed this in a sync yesterday, and agreed that we could pursue an option like this for the MVP. @katefarrar will create a mock that shows of where and how this works.
We had a few ideas on whether it should just simply display all the nodes, or we should limit the selection to the top 10 - but that means we'd need to differentiate between top 10 CPU or memory metrics. The metrics explorer is built to display lots of metrics charts for each node and paginate if the count explodes. I reckon for the first iteration we can show an option to display all the nodes and we can iterate on whether we want to supply more specific options for top 10 or individual nodes.
@jasonrhodes @katefarrar and I discussed this in a sync yesterday, and agreed that we could pursue an option like this for the MVP. @katefarrar will create a mock that shows of where and how this works.
Here is an idea for how we could link to the Metrics Explorer:
@alex-fedotyev @formgeist @jasonrhodes curious to hear any feedback you have. thanks!
@katefarrar I think the design makes sense in this way, but we also have to be mindful of not giving the user too many navigation options around the same area without having a clear direction of what the user should be doing in these views. I know that we've also discussed offering the option to filter by the node individually, so that's a 3rd option. @alex-fedotyev thoughts on adding this option to visualize all the nodes in the Metrics Explorer UI?
@katefarrar - I missed replying on this.
@alex-fedotyev that sounds good for the MVP. Thanks!
Added elastic/kibana#131308 as a follow-up item.
There are a lot of follow-up issues attached to this epic. Should we categorize them so that we have some sense of "done" for the current round of development and move the rest into the backlog for future work? Or maybe leave this as an ongoing Epic but create new ones to represent the subsets of work we want to focus on?
Here is my vote for what things we should do now and in which order:
Issue | URL | Effort |
---|---|---|
Calculate uptime correctly | https://github.com/elastic/kibana/issues/133119 | Small |
Use correct field for Pod/Container CPU usage | https://github.com/elastic/kibana/issues/133122 | Small |
Scale percentage values | https://github.com/elastic/kibana/issues/133124 | Small |
Show percentage memory usage | https://github.com/elastic/kibana/issues/133123 | Small |
Truncate name column | https://github.com/elastic/kibana/issues/130642 | Small |
Add module filters | https://github.com/elastic/kibana/issues/131308 | Small/Medium |
Support Docker only environments | https://github.com/elastic/kibana/issues/133125 | Small/Medium |
Add empty states | https://github.com/elastic/kibana/issues/127742 | Medium |
Verify that linked node details pages work | https://github.com/elastic/kibana/issues/128639 | Medium |
I think these things can be left for later, or need product input:
Issue | URL | Effort |
---|---|---|
Change filter interface | https://github.com/elastic/kibana/issues/132128 | Small |
Data accuracy communication | https://github.com/elastic/kibana/issues/128643 | Medium |
Use terms agg | https://github.com/elastic/kibana/issues/128645 | Large |
Add aggregation charts | https://github.com/elastic/observability-design/issues/141 | Large |
Adding "telemetry" | https://github.com/elastic/kibana/issues/128642 | Unknown |
Migrate to new docs platform | https://github.com/elastic/kibana/issues/127862 | Unknown |
@miltonhultgren this looks good -- let's get those top ones pulled directly into Ready on the current cycle board if they aren't already.
I think we can leave "data accuracy communication" up to APM — they can mention it on their tab rather than it being a decision we make at the shared component level, what do you think?
Same goes for telemetry, I think. We may want to have our own top-level telemetry but for now, we can leave that up to APM if they want specific telemetry added to their own tab.
Lastly, I'd like to prioritize changing the interface but I agree it's not important enough to pull it in quite yet.
Thanks, @miltonhultgren !
@jasonrhodes About data accuracy: It's mainly because we're using the composite agg, so we don't get any full set sorting on Elasticsearch.
Even with a terms agg it would still be possible inaccurate but within the usual norms for Kibana (which we can easily explain as "this is how ES works").
So perhaps more important than that is swapping to a terms agg which we can do behind the scenes.
I updated the Epic description and our board.
@miltonhultgren oh sorry, yes I know exactly what it refers to. I just don't think the component should make the decision for the user of the component about whether to add some disclaimer to the UI about this possible discrepancy. I think it's likely a better idea to mention the situation in the component docs and then let the users of the component make their own decision about how to message about this if they want to, using their own messaging next to the shared component.
Does that make sense?
This is shipping as beta in 8.5. Closing this issue. We can prioritize any follow-up issues as needed.
Summary
A React component exists that allows a caller to request a given "node type" along with a time range and a KQL filter and receive a table containing a list of matching entities with associated (pre-defined per node type) metrics.
Example: something similar to this table (only what's below the graphs shown here, for now)
Tickets
MVP component for use in APM
Follow-up issues for Infra Monitoring UI
Possible later stage considerations: