elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
374 stars 111 forks source link

Baggage in APM Agents #757

Open AlexanderWert opened 1 year ago

AlexanderWert commented 1 year ago

Description

Problem

The baggage concept allows users to propagate additional (custom) context downstream together with the tracing context. For example, when there are services

A -> B -> C

service A could propagate some information to B and C, that otherwise would be only available in the scope of A (e.g. user ID, device.id in case of mobile, some product ID, etc.). Such additional context can be extremely valuable when troubleshooting performance and reliability issues.

Example: Let's assume in the scheme above, A is a mobile app that propagates a device.id (that is unique, but not attributable to a person, so no PII) through the baggage. C is some backend service on the downstream invocation path of A. Let's assume C has an increased error rate for the error group XYZ. With the baggage implementation, the device.id could be easily added as an additional label to all errors. Thus, just by counting the unique number of device.ids on XYZ, an SRE could easily answer the question on how many users (i.e. mobile devices) are affected by that error.

Solution

Spec Issue

Agent Issues

GeorgeGkinis commented 5 months ago

I see no plan for support in the RUM agent?

Currently baggage seems to get dropped? https://www.elastic.co/guide/en/apm/agent/rum-js/4.x/opentracing.html#opentracing-baggage

We have a use case where we want to know which user-facing services are impacted by error.culprits on other services down the line. Being able to propagate data from our frontend (i.e. which button was clicked) down the line could solve our use case, as the documents with error.culprit filled would include the impacted client-facing product/services and allow prioritising which issues to resolve first.