geopython / GeoHealthCheck

Service Status and QoS Checker for OGC Web Services
https://geohealthcheck.org
MIT License
84 stars 71 forks source link

Collect e-mails before being sent #378

Open fsteggink opened 3 years ago

fsteggink commented 3 years ago

Currently, GHC sends an e-mail whenever a resource check fails, and also when the resource check is working again. This is the case when you've set GHC_NOTIFICATIONS_VERBOSITY to False. With this variable set to True, you'll probably get even more e-mails.

I was performing a test with a large number of resources (close to 1000) and with a short test interval (every 5 minutes). The server being tested had some trouble coping with this test, so this was more a load test than a monitoring test. As a result GHC started sending a lot of e-mails, until a limit imposed by my provider was hit. This was worsened by the fact that earlier today a lot of tests failed due to wrong credentials being sent (HTTP 401 errors).

Would it be a good idea to have an option with which GHC can batch status reports and send them only after an interval has passed? For example, every minute, or every 5 minutes. Of course this means that the delivery of mails notifying the administrator of failures has a delay, so sending this interval too high is not advisable. Perhaps the first mail could be sent immediately, and any other mails within this interval could be batched.

Regarding my test: foremost this was just a test, but it made me think about differentiating between the type of checks which should be done. Some checks which have been added are meant to check the availability, where as other checks are meant to do more in depth checks on the content. The former checks could be performed at a higher interval than the latter. Nonetheless, I'm seeking a solution where I won't be hit by the limit imposed by my provider.