Code2Life / nodess-apps

Published applications of node serverless framework (nodess - based on node-http-adapter)
7 stars 0 forks source link

Notify for alerts failed | unexpected status code 400 #1

Closed s4kharitonov closed 5 years ago

s4kharitonov commented 5 years ago

Hello, I got these in logs after test pipeline alert:

curl -X POST http://your.alertmanager:9093/api/v1/alerts \ -H 'Content-Type: application/json' \ -d '[{ "labels": { "alertname": "test_alert", "service": "my-service", "severity": "critical", "instance": "10.10.10.10" }, "annotations": { "summary": "Test service is down!" }, "generatorURL": "http://prometheus.cluster/graph" }]'

Oct 15 13:07:32 host alertmanager[2162]: level=error ts=2019-10-15T13:07:32.846Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 400: http://domain:8080" context_err=null Oct 15 13:07:32 host alertmanager[2162]: level=error ts=2019-10-15T13:07:32.846Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 400: http://domain:8080"

Messages were received in zoom chat, but they repeat every 5 minutes. As I notice with interval configured in group_interval: 5m option (if setting 30 seconds messages will be repeat every 30 seconds)

Could you take a look this? Thanks in advance

s4kharitonov commented 5 years ago

@Code2Life

Code2Life commented 5 years ago

notification interval should be "repeat_interval" in alert manager configuration. could you double check your "group_by" and "group_interval" configuration?

400 status code issue seems to be a bug, I'm fixing

s4kharitonov commented 5 years ago

My alertmanager configuration

global:
  smtp_smarthost: '***'
  smtp_from: '***'
  smtp_auth_username: '***'
  smtp_auth_password: '***'

templates:
- '/etc/alertmanager/template/*.tmpl'

route:
  group_by: ['instance']
  group_wait: 5s
  group_interval: 5m
  repeat_interval: 24h
  receiver: webhook
  routes:
  - match:
      severity: critical
    receiver: webhook
  - match:
      severity: warning
    receiver: email

receivers:
- name: webhook
  webhook_configs:
  - url: http://domain:8080

- name: email
  email_configs:
  - to: '***'
    send_resolved: true

group_by i tried to several values as job, alertname, instance Also I wrote to alertmanager, maybe it can help https://github.com/prometheus/alertmanager/issues/2071

Code2Life commented 5 years ago

@s4kharitonov alertmanager 400 issue has been fixed. could you please update SERVER_VERSION env var to upgrade and restart notification deployment? This may help reduce repeatly notifications, I'm not sure.

s4kharitonov commented 5 years ago

@Code2Life I have tested and all work fine, thank you

Code2Life commented 5 years ago

Thanks a lot for using this plugin and discovering issues. I'll contact Zoom Integration Team to support alertmanager natively later.