Problem: we know based on our Slack channel that some services go off “often”, but it’s hard to get the bigger picture. How often do we get a Critical on GroßProductionßervice? Did we really get these warnings since the beginning of the week-end? etc.
Quick solution: Let’s emit a Statsd counter cabot.<service>.<severity>.count
Problem: we know based on our Slack channel that some services go off “often”, but it’s hard to get the bigger picture. How often do we get a Critical on GroßProductionßervice? Did we really get these warnings since the beginning of the week-end? etc.
Quick solution: Let’s emit a Statsd counter
cabot.<service>.<severity>.count