caas-team / sparrow

A monitoring tool to gather infrastructure network information
Apache License 2.0
6 stars 4 forks source link

A sparrow cannot register itself if the DNS name is the same but the scheme changed. #147

Open puffitos opened 3 weeks ago

puffitos commented 3 weeks ago

Issue

If a sparrow named test.sparrow.net changes its scheme from https to http or vice versa, it won't be able to register itself, if the registration file for that DNS name is already taken.

This also affects all other sparrows, which are trying to reach a non-existent target and are probably generating false alerts, when the restarted instance was more important than others.

Details

A sparrow tries to register itself (using the gitlab target manager) by creating a file via POST to the gitlab API, which is named after its DNS name.

If a sparrow doesn't unregister itself after shutting down (before changing its scheme), each sucessful registration attempt will fail, as the file will already exist (that's a Gitlab API specific problem only).

The sparrow can't also discover itself, because when it gathers its targets, it doesn't only check for the filename but also if its own URL matches. If the sparrow currently runs with http and was registered via https, then it won't automatically understand that it's registered.

Suggested solution(s)

No matter what the solution may be down the road, we should first expose a metric, when a sparrow isn't registrered (1/0 values), so the error can be promptly corrected manually, by deleting the old registration file.

Original Comment

From PR !145:

The scheme of a running sparrow instance cannot change on the fly. This would suggest, that the sparrow must be shut down. If shut down gracefully, the sparrow cleans up after itself. If the sparrow is killed, then some maintenance overhead is unfortunately needed; the registration repo must be edited, to remove the old http/https target.

A false negative here isn't such a big deal I think, because changing the scheme is an important switch - an HTTPs call to an HTTP target won't necessarily work (and vice versa).

_Originally posted by @puffitos in https://github.com/caas-team/sparrow/pull/145#discussion_r1628945250_