canonical / prometheus-juju-exporter

GNU General Public License v3.0
2 stars 8 forks source link

Labelsets of some removed machines could remain in registry #26

Closed agileshaw closed 1 year ago

agileshaw commented 1 year ago

Currently, the way we track whether a machine is removed is by comparing two lists, _previously_cachedlabels, which contains the list of machines in the previous collection job, and _currently_cachedlabels, which contains the list of machines found in the current collection job. Any extra machines found in the previous job and not the current job is considered to be removed.

This approach works fine for most scenarios, but we observed a subtle error in an edge case: if the removed machines are not successfully deleted from the registry at the time count discrepancy is recorded by the collector, their labelsets would remain there and becomes unknown to the collector after _previously_cachedlabels gets overwritten at the beginning of the next collection job.

One idea to prevent this error would be, instead of using a _previously_cachedlabels list to cache the machines, to always retrieve labelsets from registry and compare it to the current machine list. The possibility of this approach depends on whether prometheus python client provides a method to get all labelsets in registry.

agileshaw commented 1 year ago

Closing this issue because its fix (https://github.com/canonical/prometheus-juju-exporter/pull/33) has been merged