Netflix-Skunkworks / spectatord

A high performance metrics daemon
Apache License 2.0
23 stars 5 forks source link

Tags may be duplicated when publishing #57

Open brharrington opened 2 years ago

brharrington commented 2 years ago

The publisher always adds the common tags for each measurement, but the measurement may have those same tag keys directly. In that case there are duplicate and possibly conflicting values for a given key.

copperlight commented 2 years ago

The aggregator expects that metrics received from spectatord should not have duplicate tags.

Historically, we have advised that should you want to customize common tags for spectatord, you should unset the Netflix environment variables for those tags. We are leveraging this approach for the nf.container and nf.process tags when a spectatord instance will be shared in a multi-process and/or multi-container environment. When unsetting these tags, you then have the opportunity to provide them in the spectatord line protocol.

The clients which are commonly used to send line protocol messages to spectatord will deduplicate tags, as a result of using maps to track user tags. However, spectatord has a bug where it does not check the received line protocol for the presence of common tags and thus may duplicate common tags in the payloads that it sends to the Atlas aggregator service.

Thinking through this a bit, we think the following strategy should be viable:

copperlight commented 2 years ago

Separately, the following pull request was merged into Spectator to handle the case of duplicated tags on the backend:

https://github.com/Netflix/spectator/pull/975

copperlight commented 2 years ago

The cause of the duplicate tags was providing a custom Registry configuration to spectator-py which set common_tags to an empty map, and then the common tags were set specifically for the metrics. When spectator-py was upgraded to the latest version which uses spectatord, the tags were duplicated. In the code base where this was happening, two separate Registries were operated - one for standard metrics and one for metrics where common tags were overridden.