fluxcd / notification-controller

The GitOps Toolkit event forwarder and notification dispatcher
https://fluxcd.io
Apache License 2.0
150 stars 132 forks source link

Slack/Teams/PagerDuty messages not working #844

Closed braun1928 closed 1 month ago

braun1928 commented 3 months ago

Somehow, any of the providers above get any messages from flux. Alerts were created similar to the docs:

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: test
  namespace: flux-system
spec:
  summary: Testing notification
  providerRef:
    name: provider
  eventSources:
    - kind: GitRepository
      name: '*'
    - kind: Kustomization
      name: '*'

Providers too:

---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack-bot
  namespace: flux-system
spec:
  type: slack
  channel: alerts
  address: https://slack.com/api/chat.postMessage
  secretRef:
    name: slack-bot-token
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack-webhook
  namespace: flux-system
spec:
  type: slack
  channel: alerts
  secretRef:
    name: slack-webhook
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: pagerduty
  namespace: flux-system
spec:
  type: pagerduty
  channel: R...
  address: https://events.pagerduty.com
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: teams
  namespace: flux-system
spec:
  type: msteams
  secretRef:
    name: teams-webhook
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: generic
  namespace: flux-system
spec:
  type: generic
  address: https://...ngrok-free.app

Removed the severity so I could test any message coming through, but it hasn't worked. Only message that worked was with the generic provider (I ran locally a Python http server through ngrok). Even tried to update the Slack provider to use that ngrok URL as its address, but nothing reached the service.

Tested with wget from within the container, all endpoints were reached, so it's not a NetworkPolicy nor EC2 Security Group blocking anything (and would happen with ngrok if that was the case).

Any ideas on what is going on?

Tested in two different EKS clusters, versions 1.2.2 and 1.3.0, same behaviour happened in both.

comminutus commented 1 month ago

I have a similar issue. I'm trying to use the Slack provider with the incoming webhook configuration. It looks like events are getting dispatched correctly but I don't see anything inside Slack. For example, from the notification controller logs:

{"level":"info","ts":"2024-07-12T15:40:49.493Z","logger":"event-server","msg":"dispatching event","eventInvolvedObject":{"kind":"Kustomization","namespace":"default","name":"nfs-provisioner","uid":"d8281d48-e3ee-4654-8238-50adb30ce3ec","apiVersion":"kustomize.toolkit.fluxcd.io/v1","resourceVersion":"18642991"},"message":"Reconciliation finished in 194.613846ms, next run in 1h0m0s"}
stefanprodan commented 1 month ago

@comminutus that event is not meant to reach Slack, you only get events when something changed in the cluster.

comminutus commented 1 month ago

@stefanprodan that's strange, because my Alert looks like this:

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: all-alerts
  namespace: default
spec:
  providerRef:
    name: slack
  eventSources:
    - kind: HelmRelease
      name: '*'
    - kind: Kustomization
      name: '*'

If I don't create this Alert in the cluster, then the message I get from the notification controller is, discarding event, no alerts found for the involved object. When I add the alert, I get dispatching event.

If the event doesn't go to slack, then what does "dispatching event" mean?

stefanprodan commented 1 month ago

If the event doesn't go to slack, then what does "dispatching event" mean?

That particular event is for Git commit status updates, it doesn't get routed to Slack as it would create a massive SPAM. Change something in your manifests in Git that will trigger a change in the cluster, e.g. bump the replicas of some deployment, and the notification should show up in Slack.

comminutus commented 1 month ago

@stefanprodan , ok I deleted a kustomization, and reconciled the parent kustomization. This appeared in the notification controller log:

{"level":"info","ts":"2024-07-13T11:57:56.183Z","logger":"event-server","msg":"dispatching event","eventInvolvedObject":{"kind":"Kustomization","namespace":"default","name":"fresh-rss","uid":"2c897664-8ead-49d3-b8d1-e77188e4e863","apiVersion":"kustomize.toolkit.fluxcd.io/v1","resourceVersion":"19518738"},"message":"Secret/default/fresh-rss created\nService/default/fresh-rss created\nDeployment/default/fresh-rss created\nPersistentVolumeClaim/default/fresh-rss created\nDatabase/default/fresh-rss created\nGrant/default/fresh-rss created\nUser/default/fresh-rss created\nIngress/default/fresh-rss created"}

I still don't get any message in Slack.

Also, it looks like the only place in the code where "dispatching event" exists is here: https://github.com/fluxcd/notification-controller/blob/fa93b71722541a583b5085314ca8f11be7da5b81/internal/server/event_handlers.go#L76, which looks like it calls the dispatchNotification function on the EventServer. I don't see where it would be doing any filtering.

matheuscscp commented 1 month ago

@comminutus I have just made a test here and I have a Slack Incoming Webhook provider working in production right now with the latest version of Flux, the configuration looks like this:

apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: alert
  namespace: flux-system
spec:
  eventSeverity: error
  eventSources:
  - kind: GitRepository
    name: '*'
  - kind: Kustomization
    name: '*'
  providerRef:
    name: slack
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: slack
  namespace: flux-system
spec:
  channel: flux-releases
  secretRef:
    name: slack-url
  type: slack
---
apiVersion: v1
kind: Secret
metadata:
  name: slack-url
  namespace: flux-system
type: Opaque
stringData:
  address: https://hooks.slack.com/services/xxxxxxx/xxxxxxx/xxxxxx

Do you have a URL that looks like the one above? If you send a correct payload to that URL does it work?

comminutus commented 1 month ago

@matheuscscp Thanks, I hadn't thought of just testing the webhook url. When I tried it with curl it couldn't resolve hooks.slack.com. Same with getent hosts ... . Then I realized NextDNS was blocking it 😬 👎 . Sorry for the false alarm!