Open tobz opened 1 month ago
One idea: metric remapping.
Conceptually, specific components in Saluki map to specific components in the core Agent. For example, the DogStatsD source in ADP is the dogstatsd
component in the Datadog Agent, and the Datadog Metrics destination in ADP is the defaultforwarder
component in the Datadog Agent. If we included the component type in internal metrics (e.g., metrics from the Datadog Metrics destination have a component_type
tag with a value of datadog_metrics
), we could conceivably use that to remap metrics to their Datadog Agent equivalent.
For example, datadog.agent.transactions.errors
in the Datadog Agent is used to track "transaction errors", which occur when the default forwarder fails to send a request to the Datadog intake. The error_type
tag indicates the specific type of error. Similarly, on the Saluki side, the Datadog Metrics destination emits a component_errors_total
metric, with an error_type
tag that has a value of http_send
, when we fail to send a request.
Since we should expect to only have one Datadog Metrics destination running in ADP, we could conceivably map all instances of component_errors_total
, where component_type
was equal to datadog_metrics
, to agent.transactions.errors
.. and potentially map the error_type
tag as well.
We could likely do this pretty simply with a dedicated transform that remaps metric names, perhaps one even designed solely for remapping to Datadog Agent-equivalent metric names. Biggest downside, I think, is just the general aspect of us having to maintain this mapping in the first place rather than doing it by default.
Another idea: change all points where we register metrics to also register Datadog Agent-specific versions.
Essentially, we would emit duplicate metrics -- a generically-named one for "pure" Saluki usage, and a Datadog Agent-specific one -- and that way anything using Saluki that wasn't ADP could have the more generic/flexible metric names, and ADP could still emit the Datadog Agent-specific metric names to meet our goal of being drop-in compatible.
This, obviously, means emitting more telemetry than absolutely necessary. If we really didn't want to do that, we could also have a transform for filtering out the generically-named metrics, leaving only the Datadog Agent-specific ones. We could also, perhaps, try and do something where we have a toggle for emitting the Saluki or Datadog Agent version... but threading that state all through Saluki would be very ugly.
At a high-level, both Datadog Agent and Agent Data Plane/Saluki emit internal telemetry used for debugging performance issues and understanding their operational state. However, the naming differs between the two by a large amount, even for metrics that are functionally identical. This makes it challenging to use ADP, as it currently exists, as a drop-in replacement for DSD support in the core Agent.
The metric prefix we use when emitting internal metrics is configurable at the tippity top when initializing the metrics subsystem via
saluki_app::metrics::initialize_metrics
, so that's fine... but how do we line up individual metrics with their spiritual equivalent in the Datadog Agent?This is a problem we need to solve if we hope to have ADP replace DSD in the core Agent.