elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
371 stars 111 forks source link

[APM] Runtime metrics for all agents in the APM App #224

Open nehaduggal opened 4 years ago

nehaduggal commented 4 years ago

Summary of the problem Most of the APM agents collect runtime metrics data which is available for customers to visualize via the apm-contrib dashboards. Java agent is the only agent that surfaces the runtime performance data on a JVMs tab for each instance of the service that is reporting. We should have a similar page for all the other agents to surface the metrics that we collect in the curated UI.

List known (technical) restrictions and requirements

For JVM page specifically we chose the tabular approach that shows individual instances instead of a chart with different line graphs to capture each instance because the number of instances reporting can be large. This assumption is probably true for all other agents. We should be able to surface the runtime performance captured by the agents and displayed in the APM App in a way that is compatible for each language ecosystem.

References

elasticmachine commented 4 years ago

Pinging @elastic/observability-design (design)

alex-fedotyev commented 4 years ago

Here is a suggestion how we could design it by leveraging existing observability UI components:

This design would allow leveraging familiar design where it is relevant (service instances are similar to infrastructure).

Linking from service view to the infra metrics would provide benefits to the SRE's to understand service performance across the farm(s) and how it relates to performance of infrastructure which hosts it, especially during the issues.

Test - Service Infrastructure

alex-fedotyev commented 4 years ago

@sorantis brought couple interesting points about the proposal above:

graphaelli commented 4 years ago

cc @lreuven

alex-fedotyev commented 4 years ago

Added design issue: https://github.com/elastic/apm/issues/301

formgeist commented 3 years ago

I'm bringing this back up as an opportunity to implement an updated metrics experience in the near-term which adds service instance level breakdown ability and adds the additional metrics that are listed for each agent below. I imagine there are a few agents missing on the list since this issue was initially created.

With the switch to Elastic Charts, there should be no blockers on the visualization part. From a design perspective, there might be some guidance on the color palettes and how the visualizations should be put together and laid out. Additionally, I imagine there should be a suggested layout for the overview/list of instances similar to the Java JVM metrics experience.

Overall I think the UI team should be able to pick this up in https://github.com/elastic/kibana/issues/63573 and ask for guidance in implementation from either design or agents.

Long-term service instance metrics experience will be explored and design in #301 in partnership with @alex-fedotyev

Thoughts? @nehaduggal @sqren @alex-fedotyev


Node

Ruby

Python

Go

PHP

.Net

nehaduggal commented 3 years ago

I would rather have us reconcile the new workflows that are being designed with the current UI we have for metrics instead of tackling this. multiple times. Once we have the UI, we can work on on-boarding metrics from all other agents.