Open SimonSapin opened 1 year ago
FYI with #2908 and #2894, these sampling rates will become independent. The Apollo telemetry plugin will see all traces, then apply the field level instrumentation sampling
@SimonSapin is this still the case? I think we've done enough work here that this issue is likely no longer true.
In the telemetry plugin,
Conf::calculate_field_level_instrumentation_ratio
looks at two configuration keys: the "global rate"tracing.trace_config.sampler
and "field-level rate"apollo.field_level_instrumentation_sampler
. If both are specified as non-zero numbers, they are divided:https://github.com/apollographql/router/blob/1bbfc16327d954485268cb844564a4fcdaa98a0d/apollo-router/src/plugins/telemetry/config.rs#L614-L617
This result is meaningful as the probability to use to randomly select a request for field-level metrics, assuming it had already been randomly selected with the global rate. Combining probabilities like this effectively multiplies them, the division compensates for that.
However that’s not how it’s used. Instead the result
calculate_field_level_instrumentation_ratio
is used as a random probability inpopulate_context()
which is run unconditionally:https://github.com/apollographql/router/blob/1bbfc16327d954485268cb844564a4fcdaa98a0d/apollo-router/src/plugins/telemetry/mod.rs#L851-L853
The default global sampling rate is 100%, so division by 1 doesn’t affect the result. However if it is configured to something less, we might enable field-level metrics more frequently than desired.
Relatedly, field-level metrics contain estimated counts that use a weight to compensate for sampling. This weight is currently based on the result of the division above but probably shouldn’t.
Again this only makes a difference with non-default config. And I think the two bug might cancel each other, so that estimated numbers end up correct anyway.
https://github.com/apollographql/router/blob/1bbfc16327d954485268cb844564a4fcdaa98a0d/apollo-router/src/plugins/telemetry/mod.rs#L1253-L1258