Open nehaduggal opened 4 years ago
Pinging @elastic/observability-design (design)
Here is a suggestion how we could design it by leveraging existing observability UI components:
This design would allow leveraging familiar design where it is relevant (service instances are similar to infrastructure).
Linking from service view to the infra metrics would provide benefits to the SRE's to understand service performance across the farm(s) and how it relates to performance of infrastructure which hosts it, especially during the issues.
@sorantis brought couple interesting points about the proposal above:
cc @lreuven
Added design issue: https://github.com/elastic/apm/issues/301
I'm bringing this back up as an opportunity to implement an updated metrics experience in the near-term which adds service instance level breakdown ability and adds the additional metrics that are listed for each agent below. I imagine there are a few agents missing on the list since this issue was initially created.
With the switch to Elastic Charts, there should be no blockers on the visualization part. From a design perspective, there might be some guidance on the color palettes and how the visualizations should be put together and laid out. Additionally, I imagine there should be a suggested layout for the overview/list of instances similar to the Java JVM metrics experience.
Overall I think the UI team should be able to pick this up in https://github.com/elastic/kibana/issues/63573 and ask for guidance in implementation from either design or agents.
Long-term service instance metrics experience will be explored and design in #301 in partnership with @alex-fedotyev
Thoughts? @nehaduggal @sqren @alex-fedotyev
Node
Ruby
Python
Go
PHP
.Net
I would rather have us reconcile the new workflows that are being designed with the current UI we have for metrics instead of tackling this. multiple times. Once we have the UI, we can work on on-boarding metrics from all other agents.
Summary of the problem Most of the APM agents collect runtime metrics data which is available for customers to visualize via the apm-contrib dashboards. Java agent is the only agent that surfaces the runtime performance data on a JVMs tab for each instance of the service that is reporting. We should have a similar page for all the other agents to surface the metrics that we collect in the curated UI.
List known (technical) restrictions and requirements
For JVM page specifically we chose the tabular approach that shows individual instances instead of a chart with different line graphs to capture each instance because the number of instances reporting can be large. This assumption is probably true for all other agents. We should be able to surface the runtime performance captured by the agents and displayed in the APM App in a way that is compatible for each language ecosystem.
References