Tags may be duplicated when publishing

brharrington commented 2 years ago

The publisher always adds the common tags for each measurement, but the measurement may have those same tag keys directly. In that case there are duplicate and possibly conflicting values for a given key.

copperlight commented 2 years ago

The aggregator expects that metrics received from spectatord should not have duplicate tags.

Historically, we have advised that should you want to customize common tags for spectatord, you should unset the Netflix environment variables for those tags. We are leveraging this approach for the nf.container and nf.process tags when a spectatord instance will be shared in a multi-process and/or multi-container environment. When unsetting these tags, you then have the opportunity to provide them in the spectatord line protocol.

The clients which are commonly used to send line protocol messages to spectatord will deduplicate tags, as a result of using maps to track user tags. However, spectatord has a bug where it does not check the received line protocol for the presence of common tags and thus may duplicate common tags in the payloads that it sends to the Atlas aggregator service.

Thinking through this a bit, we think the following strategy should be viable:

The default spectatord that ships with the BaseOS should be as stable, predictable and locked as possible. The common tags (aside from nf.container and nf.process) should all be set for this process and these tags should not be modified. This is done because there are many things on the BaseOS, such as atlas-system-agent and various BaseOS monitoring scripts, which rely on common tags being provided by spectatord.
If there is a need to send Atlas metrics through spectatord which require a custom set of common tags that will be set through the line protocol, then a second instance of spectatord should be established for this use case. This second instance should listen on a different UDP port and should not listen on the domain socket. We will provide documentation for how to supply this custom configuration.

copperlight commented 2 years ago

Separately, the following pull request was merged into Spectator to handle the case of duplicated tags on the backend:

https://github.com/Netflix/spectator/pull/975

copperlight commented 2 years ago

The cause of the duplicate tags was providing a custom Registry configuration to spectator-py which set common_tags to an empty map, and then the common tags were set specifically for the metrics. When spectator-py was upgraded to the latest version which uses spectatord, the tags were duplicated. In the code base where this was happening, two separate Registries were operated - one for standard metrics and one for metrics where common tags were overridden.

Netflix-Skunkworks / spectatord

Tags may be duplicated when publishing #57