alerta / prometheus-config

Prometheus config for Alerta
MIT License
255 stars 102 forks source link

Alertmanager (v0.19.0) can't send alerts. The response are always HTTP 400 #24

Closed mrcrch closed 4 years ago

mrcrch commented 4 years ago

Issue Summary Alertmanager can't send alerts. The response are always HTTP 400

Environment

To Reproduce

Additional context Checking Prometheus, AlertManager and Pushgateway UI's, everything looks fine.

The AlertManager's log show: component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"webhook\" due to unrecoverable error: unexpected status code 400 from http://172.18.0.1:8080/api/webhooks/prometheus"

I even saved the posted data using Beeceptor e tried do reproduce the operation using curl and got the same error: no alerts in Prometheus notification payload

satterly commented 4 years ago

Can you add the POST data to this issue please?

icy commented 4 years ago

I have the same issue but with Alertmanager 0.17.0. I use the example provided by @mrcrch and got the following issue:

{"code":400,"errors":null,"message":"no alerts in Prometheus notification payload","requestId":null,"status":"error"}
mrcrch commented 4 years ago

@satterly, the complete POST is here and bellow

curl -XPOST http://localhost:8080/api/webhooks/prometheus -d '{
    "receiver": "alerta",
    "status": "firing",
    "alerts": [
        {
            "status": "firing",
            "labels": {
                "alertname": "sampleAlarm",
                "customer": "customer",
                "environment": "dev",
                "job": "sample",
                "severity": "danger"
            },
            "annotations": {
                "summary": "Sample Summery"
            },
            "startsAt": "2019-11-17T16:33:39.027696707Z",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://835ec2c81ce5:9090/graph?g0.expr=sTransaction+%3D%3D+-1\u0026g0.tab=1",
            "fingerprint": "5fd1305436de607f"
        }
    ],
    "groupLabels": {},
    "commonLabels": {
        "alertname": "sampleAlarm",
        "customer": "customer",
        "environment": "dev",
        "job": "sample",
        "severity": "danger"
    },
    "commonAnnotations": {
        "summary": "Sample Summary"
    },
    "externalURL": "http://1131175230f0:9093",
    "version": "4",
    "groupKey": "{}:{}"
}'
satterly commented 4 years ago

Any idea which version of Alertmanager this stopped working? Something has changed to do with the http content type off the payload (I think) and it would be useful to know exactly what changed, and why.

mrcrch commented 4 years ago

I'll run some tests. This version (v0.19.0) was the only one that I used

vykulakov commented 4 years ago

Use the same version of Alertmanager and don't have such problems.

So maybe there is a problem with configuration or with other components like Prometheus because I use an older version of Prometheus (2.13.0).

May try to install the new version of Prometheus to check will I have any problems with it.

vykulakov commented 4 years ago

Do you know any fast way to dump Alertmanager requests with headers and payload before and after the upgrade?

satterly commented 4 years ago

Run netcat locally to listen on a port and then change your alertmanager config to point to localhost:port and you should see the output.

For example, in one terminal...

$ nc -l 8888

... nc waits here for curl command below, and then prints the following...

GET / HTTP/1.1
Host: localhost:8888
User-Agent: curl/7.54.0
Accept: */*

Curl command that generates the above output from netcat...

$ curl http://localhost:8888
vykulakov commented 4 years ago

@mrcrch I run your curl example against my Alerta server and it works fine with small modifications:

  1. It is necessary to pass the content-type header: Content-Type: application/json
  2. I removed the customer fields from your example because customers are not configured in my Alerta instance.
  3. I changed values for the environment and severity fields to suit my configuration.

Can you try to execute your example with correct headers and attach the result here?

vykulakov commented 4 years ago

Finally, I've updated Prometheus to 2.14.0 and after some hours I don't see any problems with sending alerts to Alerta. So it is necessary to have more details about requests and responses to solve the problem.

satterly commented 4 years ago

That's great, @vykulakov. Thanks for all your work on this.

mrcrch commented 4 years ago

Thanks @vykulakov and @satterly. I reconfigured all my environment and did some basic tests using curl and it worked

I had written a more detailed scenario but forgot to send it :man_facepalming:

Long story short: adding the Content-Type header was enough to change the error message. After that, I solved my configuration problems (clients and severities) and everything was fine.

My next step is to review everything on the AlertManager side. The evidences suggests that it may at least be omitting the Content-Type header

I haven't been able to do any other tests yet, sorry for that.

Thanks again

satterly commented 4 years ago

This issue is blocked waiting on more information. The description or subsequent comments do not provide enough information to triage, investigate or resolve the issue. Please review the description against the issue template and ensure all relevant information is included. If you do not know what is expected you can ask on Gitter chat.

mrcrch commented 4 years ago

Let me do do my final contributution to this issue :smile:

First scenario - Only AlertManager

Docker network

docker network create central

HTTP-Echo container

This container is used to save all request's content

Running

docker run --rm -d -p 80:80 \
    --name http-echo \
    --network central \
    mendhak/http-https-echo

Testing

curl -H "Content-Type:application/json" -X POST --data '{ "data": "dummy" }' http://localhost/http-echo-test

Log

{ path: '/http-echo-test',
  headers:
   { host: 'localhost',
     'user-agent': 'curl/7.47.0',
     accept: '*/*',
     'content-type': 'application/json',
     'content-length': '19' },
  method: 'POST',
  body: '{ "data": "dummy" }',
  cookies: undefined,
  fresh: false,
  hostname: 'localhost',
  ip: '::ffff:172.19.0.1',
  ips: [],
  protocol: 'http',
  query: {},
  subdomains: [],
  xhr: false,
  os: { hostname: '120fecd03d2f' } }
::ffff:172.19.0.1 - - [24/Feb/2020:14:19:39 +0000] "POST /http-echo-test HTTP/1.1" 200 460 "-" "curl/7.47.0"

AlertManager

:warning: Updated to latest version

Configuration

route:
    receiver: http-echo
    group_wait: 30s
    group_interval: 30s
    repeat_interval: 1m

receivers:
    - name: http-echo
      webhook_configs:
          - url: http://http-echo
            send_resolved: true

Running

docker run --rm -d -p 9093:9093 \
    --name alertmanager \
    --network central \
    --volume $(pwd)/config/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
    prom/alertmanager:v0.20.0 \
        --config.file=/etc/alertmanager/alertmanager.yml \
        --log.level=debug

Testing

alerts='[
  {
    "labels": {
       "alertname": "dummy_problem"
     },
     "annotations": {
        "info": "There is a dummy problem",
        "summary": "Dummy problem"
      }
  }
]'

curl -XPOST -d"$alerts" http://localhost:9093/api/v1/alerts

Alertmanager logs

level=debug ts=2020-02-24T14:21:49.052Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=dummy_problem[f1834da][active]
level=debug ts=2020-02-24T14:22:19.052Z caller=dispatch.go:465 component=dispatcher aggrGroup={}:{} msg=flushing alerts=[dummy_problem[f1834da][active]]

HTTP-Echo logs

{ path: '/',
  headers:
   { host: 'http-echo',
     'user-agent': 'Alertmanager/0.20.0',
     'content-length': '534',
     'content-type': 'application/json' },
  method: 'POST',
  body: '{"receiver":"http-echo","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"dummy_problem"},"annotations":{"info":"There is a dummy problem","summary":"Dummy problem"},"startsAt":"2020-02-24T14:21:49.052345517Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"","fingerprint":"f1834daff6301092"}],"groupLabels":{},"commonLabels":{"alertname":"dummy_problem"},"commonAnnotations":{"info":"There is a dummy problem","summary":"Dummy problem"},"externalURL":"http://02702a6c5041:9093","version":"4","groupKey":"{}:{}"}\n',
  cookies: undefined,
  fresh: false,
  hostname: 'http-echo',
  ip: '::ffff:172.19.0.3',
  ips: [],
  protocol: 'http',
  query: {},
  subdomains: [],
  xhr: false,
  os: { hostname: '120fecd03d2f' } }
::ffff:172.19.0.3 - - [24/Feb/2020:14:22:19 +0000] "POST / HTTP/1.1" 200 1022 "-" "Alertmanager/0.20.0"

Conclusion

So far, so good :grinning:

Second scenario - AlertManager and Alerta

Docker network

Same of above

HTTP-Echo container

Turned off

Alerta

:warning: Updated to latest version

docker-compose.yml

version: "2.1"
services:
    web:
        container_name: alerta
        image: alerta/alerta-web:7.4.4
        ports:
            - "8080:8080"
        depends_on:
            - db
        environment:
            - DATABASE_URL=postgres://postgres:postgres@db:5432/monitoring
            - AUTH_REQUIRED=False
            - ADMIN_USERS=admin@alerta.io,devops@alerta.io
            - PLUGINS=prometheus
        restart: always
    db:
        image: postgres
        environment:
            POSTGRES_DB: monitoring
            POSTGRES_USER: postgres
            POSTGRES_PASSWORD: postgres
        restart: always

networks:
    default:
        external:
            name: central

AlertManager

Configuration

route:
    receiver: alerta
    group_wait: 30s
    group_interval: 30s
    repeat_interval: 1m

receivers:
    - name: alerta
      webhook_configs:
          - url: http://alerta:8080/api/webhooks/prometheus
            send_resolved: true

Running

docker run --rm -d -p 9093:9093 \
    --name alertmanager \
    --network central \
    --volume $(pwd)/config/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
    prom/alertmanager:v0.20.0 \
        --config.file=/etc/alertmanager/alertmanager.yml \
        --log.level=debug

Testing

alerts='[
  {
    "labels": {
       "alertname": "dummy_problem_2"
     },
     "annotations": {
        "info": "There is a dummy problem number 2",
        "summary": "Dummy problem number 2"
      }
  }
]'

curl -XPOST -d"$alerts" http://localhost:9093/api/v1/alerts

Alertmanager logs

level=debug ts=2020-02-24T14:41:45.126Z caller=dispatch.go:135 component=dispatcher msg="Received alert" alert=dummy_problem_2[4e7ec3d][active]
level=debug ts=2020-02-24T14:42:15.127Z caller=dispatch.go:465 component=dispatcher aggrGroup={}:{} msg=flushing alerts=[dummy_problem_2[4e7ec3d][active]]

Alerta print Alerta_Print

Conclusion

Everything looks fine :+1:

General conclusion

After all this, I can't say for sure what was the problem. I believe that was a simple configuration error that I couldn't see earlier

I just want to say sorry for take your time and thank you all, specially @vykulakov and @satterly