TwiN / gatus

⛑ Automated developer-oriented status page
https://gatus.io
Apache License 2.0
5.99k stars 403 forks source link

Support alerting provider overrides in alert configuration #96

Open TwiN opened 3 years ago

TwiN commented 3 years ago

Currently, users can target specific groups of people by using the description as a way to notify specific groups for specific alerts. This, however, does not apply for all alerting providers.

For instance:

alerting:
  slack: 
    webhook-url: "https://hooks.slack.com/services/**********/**********/**********"

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        description: "<@!subteam^SLACK_MANAGER_GROUP_ID>"
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        description: "<!subteam^SLACK_DEV_GROUP_ID>"
    conditions:
      - "[STATUS] == 200"

(note the services[].alerts[].description, reference)

By tweaking the notification settings, each groups could be notified only when it concerns them.

That being said, I do understand that this is more of a hack than a viable solution; I think the root cause is that there may be a need for supporting multiple slack webhooks rather than just one, even if you could technically use custom to create a 2nd Slack provider, having the ability to create N alerting providers would be better.

Across all alerting providers, there appears to always only one parameter that may need to change in order to support multiple recipients:

As a result, we may be able to just add a single optional parameter to the service alert (as opposed to the alerting provider) to override that "key" parameter for a specific service:

alerting:
  slack: 
    webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        override-target: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
    conditions:
      - "[STATUS] == 200"

As a result, the relevant-to-devs service would send to the overridden webhook https://hooks.slack.com/services/**********/**********/webhook-for-devs instead of the default "https://hooks.slack.com/services/**********/**********/webhook-for-managers".

It's a little hacky, especially when you look at the Slack alerting provider which only has a single parameter (webhook-url) as opposed to the Twilio alerting provider, which has several parameters out of which only one is relevant to alert a different target (to), but I think that if we want to avoid breaking changes, this is the best way to do it: it's transparent for those that don't need the feature, and while most users will likely never use that feature, the option is still available.

TwiN commented 3 years ago

ref: #84

zeylos commented 3 years ago

Hi there,

Wouldn't it be easier to allow a list of webhooks indexed by names under slack or mattermost ? Something like that :

alerting:
  slack: 
    - name: manager
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"
    - name: devs
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        target: manager
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        target: devs
    conditions:
      - "[STATUS] == 200"

We could also add a default keyword on an alerting item which would make the target key optionnal.

Btw just discovered your work, it rocks !! Good job

Zeylos

fixje commented 3 years ago

Across all alerting providers, there appears to always only one parameter that may need to change in order to support multiple recipients: ... Custom: URL, but probably irrelevant

We have set up a web hook for Rocket.Chat using alerting.custom and want to target different channels/users. The channel is one property within the JSON request body. So this assumption is not true for our use case.

The most versatile approach would imho be someting like suggested by @zeylos - have the ability to configure multiple alerting "channels" and reference them by name. For convenience, the reference in the services section could be left out if there is just one channel for the given alerts.type

alerting:
  slack: 
    - name: manager
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"
    - name: devs
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
  custom:
   - name: frontend-ops
     url: ...
   - name: backend-ops
     url: ...

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        target: manager
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        target: devs
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-fe-ops
    url: "https://example-for-devs.org/"
    alerts:
      - type: custom
        enabled: true
        target: frontend-ops
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-be-ops
    url: "https://example-for-devs.org/"
    alerts:
      - type: custom
        enabled: true
        target: backend-ops
    conditions:
      - "[STATUS] == 200"

Great project btw!!

TwiN commented 2 years ago

FYI: #181 added support for the integrations parameter in PagerDuty, which essentially enables group-specific alerts.

I've renamed them to overrides, as that PR introduced them as integrations and I wanted to make it generic enough so that a similar change could be ported to other alerting providers.

The idea is that porting this to other alerting providers would enable something like this:

alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/**********/**********/default-webhook"
    overrides:
      - group: "frontend"
        webhook-url: "https://hooks.slack.com/services/**********/**********/frontend-webhook"
      - group: "backend"
        webhook-url: "https://hooks.slack.com/services/**********/**********/frontend-webhook"

endpoints:
  - name: "something-else"
    group: "misc"
    # ...

  - name: "home"
    group: "frontend"
    # ...

  - name: "api-health"
    group: "backend"
    # ...

While this doesn't offer the same flexibility as allowing an override at the service level, this should at least solve some use cases.

TwiN commented 1 year ago

Here's an example of something that might work better:

For example:

alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/aaaaaaaaaaaaaaaaaaaaaaaaaaa"

endpoints:
  - name: "example-with-no-override"
    alerts:
      - type: slack

  - name: "example-with-override"
    alerts:
      - type: slack
        config:
          webhook-url: "https://hooks.slack.com/services/bbbbbbbbbbbbbbbbbbbbbbbbbbbbb"

Basically, endpoints[].alerts[].config could be merged with alerting.<provider>.