bobbydams / py-pinger

A monitoring tool written in Python using gevent and Flask
MIT License
2 stars 2 forks source link

Better name for the frequency attribute as it can lead to timing problems #2

Open nhoening opened 6 years ago

nhoening commented 6 years ago

I am running into a problem that might stem from a smenatic misconception as for what the frequency attribute which the pinger expects from the task information URL.

It seems to me that with frequency, the pinger is told that the monitored task is run each x minutes (in our case it's ten).

However, once in a while, the pinger would check and find a task run was recorded at ten minutes ago plus a few seconds. He then complains about being utside of the acceptable range. The few seconds are probably network latency, or one forecasting job batch actually taking a bit more time than the job batch ten minutes earlier did.

Example log entry: 2018-09-07 20:01:52,624 ERROR Error: BVP/staging is outside of the acceptable 10 minute range. Last Run 2018-09-07 19:51:16.359826+00:00 UTC with status OK

I think if the task runs every x minutes, the pinger must allow for x + y minutes for itself to safely check if there really is a mentionable out-of-range problem there.

I set the frequency in the pinger conf to 15 minutes now in our environment.

Effectively, I propose to improve the semantics of that frequency setting, that should improve the pinger overall. Either its name changes, say to ping-frequency (together with an adapted documentation), or the pinger allows for ten or twenty percent extra margin (e.g. frequency * 1.2).

nhoening commented 6 years ago

We could also simply extend the documentation to explain what is meant by "frequency" (the frequency that the pinger regards as healthy).

But that is a bit forced, the nicest way would be to call it something more speaking, like "complain-after". However, I realize this would mean some production systems to be updated over at Softwear, probably.