getsentry / relay

Sentry event forwarding and ingestion service.
https://docs.sentry.io/product/relay/
Other
312 stars 85 forks source link

[Epic] Extrapolation for extracted metrics #3724

Open jjbayer opened 2 weeks ago

jjbayer commented 2 weeks ago

SDKS apply a client-side sampling rate to transactions / spans, so any metrics we extract from them are inherently sampled, even with server-side sampling disabled. To represent the sampling rate, take the client-side sampling rate from the envelope header and add it as a weight to any metric bucket extracted from the transaction / span payload (weight := 1.0 / dsc.sample_rate). When merging two buckets, add their weights together.

Tech Spec

https://www.notion.so/sentry/Metrics-Extrapolation-520ab136cb7d48ebb77be9a538bdef91

Open question: data duplication

Ideally, storage supports these weights as well. If / as long as storage does not support weights, we could convert weights into repeated data in Relay.

### Preparation
- [ ] https://github.com/getsentry/relay/issues/3738
### Custom Span Attributes
- [ ] https://github.com/getsentry/relay/pull/3753
- [ ] https://github.com/getsentry/sentry/pull/73289
- [ ] https://github.com/getsentry/sentry/issues/73185
- [ ] https://github.com/getsentry/sentry/issues/73623
- [ ] https://github.com/getsentry/team-ingest/issues/366
### Internal Test: Spans and Transactions
- [ ] Add a temporary setting for transaction / spans metrics
- [ ] https://github.com/getsentry/relay/issues/3778
- [ ] https://github.com/getsentry/sentry/issues/73186
- [ ] https://github.com/getsentry/team-ingest/issues/365
### Public: Spans and Transactions
- [ ] Rollout to SaaS + DE
- [ ] Update defaults for new projects (epoch)
iambriccardo commented 1 week ago

I am not an expert but given two buckets b1 and b2 with weights w1 and w2, if you sum w1 + w2 during merge you will not get the equivalent extrapolation as if you were to compute it correctly as as 1.0 / ((b1 * w1) + (b2 * w2) / (b1 + b2)).

I may be missing something though.