elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.51k stars 8.06k forks source link

[APM] proposal: fine-grained otel supported metrics detection #180885

Open SylvainJuge opened 3 months ago

SylvainJuge commented 3 months ago

In APM Service metrics in Java, we currently provide two variants:

In order to detect which version should be displayed and which metrics to query, the has_otel_process_metrics request is made and a boolean value is returned from the presence or not of some known OTel metrics.


The problem here is that the OTel metrics is a moving target:

We should be able to provide a dedicated "portable dashboard" for any of the following configurations, ans possibly more in the future:

Each variant would only be displayed when there are matching metrics, and the dashboard selection process would be implemented in a single easy to update function that takes the list of available metrics as input. This function would also provide a heuristic to select which variant has the priority when there is a mix (for example in the case of agents with and without stable JVM metrics in java).

This approach would be relevant for both the Java agent and all of the other agents, which will also have this "moving target" problem as the metrics definition evolve.


In addition to the name of the metrics, the way the data is structured might also change, so we should also query and include known labels that provide breakdown for a given metric in order to select the appropriate dashboard variant.

For example, when taking the "used heap memory" metric, we have different ways to represent it:

elasticmachine commented 3 months ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

SylvainJuge commented 3 months ago

After discussing this with @AlexanderWert today we think that this proposal might be a bit too complex and there might be a simpler alternative in the (hopefully not too far away) future by querying schema_url to provide semconv version used and use a dedicated dashboard per version.

While the current structure of the metrics could be detected by the metric name and the presence of some known attributes, there would still be cases where breaking changes could happen, for example when the metric name remains the same but the data type changes.

So, the short term option will likely to drop support for 1.x agents that do not use stable JVM metrics (which always had been partially working), and focus on #174445 using the stable definitions and use https://github.com/elastic/apm-data/issues/264 to ensure that the dashboards stay relevant in the future.