getsentry / relay

Sentry event forwarding and ingestion service.
https://docs.sentry.io/product/relay/
Other
321 stars 91 forks source link

Map Valid Span Op names instead of default #3637

Closed smeubank closed 1 month ago

smeubank commented 5 months ago

Description:

Spans have a standardized schem to set specifc key value paris in the JSON. They help process the events and populate the Sentry UI with structure information. SDKs are the first line to collect telemetry and populate the span schema correctly. Then during ingest relay can attempt to make corrections and improvements which scale then to make improvement across all SDKs.

In the case of span.op it the highest order of categorization for the type of span it is, db or http for example. In cases where there is not a valid value default is being set by relay (i believe). This is not ideal, and there are situations where according to the span orgin, we do in face have some form of a valid more precise span.op.

image

image

In this example we see the default span op in the UI, while in the JSON we see the sentry.origin is auto.db.otel.prisma

Suggestion:

Expand on the mapping in relay to set more valuable span.op, as with the example above

Based on the logic for trace origin <type>.<category>.<integration-name>.<integration-part> develop spec, relay can map this to more specific span op values (dev spec), instead of default.

Further info:

While default doesn't provide developers much value in the UI, it also can break features which rely on them. As is the case with many of our new Insights Modules.

iambriccardo commented 5 months ago

Thanks for the issue. I see the problem, are you proposing to build a smarter way of determining a default span op? If so, do you know any reliable ways of doing so based on the payload's contents?

I see that you proposed an idea of using sentry.origin but this would mean that we would need to build a hardcoded mapping somewhere that defines the associations.

jjbayer commented 5 months ago

In cases where there is not a valid value default is being set by relay (i believe). This is not ideal, and there are situations where according to the span orgin, we do in face have some form of a valid more precise span.op.

@smeubank in what cases are SDKs able to construct a valid span.origin, but not a valid span.op?

smeubank commented 5 months ago

@iambriccardo and @jjbayer i don't know this specifics, the issue in JS where we get a ton of TXNs in sentry, we now us OTel for node, and OTel doesn't map any op for spans, so we should, we are mapping origin for now, and i would agree that the SDKs could do more to map this logic, but this would be a fix that requires upgrading

So Ideally we improve this on SDKs but we don't have to rely that each SDK fixes this, and we can centrally through ingestion fix

cc @AbhiPrasad and @mydea to correct me on OTel, and if there are other situations where we cannot reliably map this in the SDK

there are 2 questions i am trying to answer above to be clear

  1. what is a reliable way to do this mapping? my suggestion is span origin, but i am open to other suggestions
  2. in what situations can SDKs not provide a valid span.op? - otel provides no span op
mydea commented 5 months ago

So we try to set an op manually for all things we instrument out of the box, but it is not always easy and we certainly miss things. Also, any OTEL instrumentation a user adds, will never have an op set by the SDK - or an origin, for the matter.

IMHO, if the op === default, we should try to infer the op in relay if we can - this file could be a base for what can be done: https://github.com/getsentry/sentry-javascript/blob/develop/packages/opentelemetry/src/utils/parseSpanDescription.ts

jjbayer commented 5 months ago

OK, I see the purpose of trying to fix the span.op on Relay side. My suggestion is, the ingest team creates a stub mapping function in Relay that SDK teams can later contribute to. We could even define this as a global config configured in sentry, so we do not depend on external Relays being up-to-date.

jjbayer commented 5 months ago

@phacops offered a different perspective on this: If otel spans will never have an op, then it might not make sense to build a product around it. I.e. we could solve this on the product side rather than make SDKs or Relay fill the gaps.