cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.47k stars 795 forks source link

Ruler is unable to send notifications to the alertmanager cluster setup #4860

Open roobalimsab opened 2 years ago

roobalimsab commented 2 years ago

Hi Team,

I configured the alertmanager cluster as per the docs (with a replica of 2) and configured ruler to point to the alertmanager headless service. But I am getting EOF on ruler and the alertmanager debug logs report invalid msgType.

Ruler Config

  alertmanager_url: http://_cluster._tcp.cortex-aggregator-v2-alertmanager-headless/api/prom/alertmanager
  enable_alertmanager_discovery: true
  alertmanager_refresh_interval: 1m0s
  enable_alertmanager_v2: false
  notification_queue_capacity: 10000
  notification_timeout: 10s

AM config

external_url: /api/prom/alertmanager
cluster:
    listen_address: 0.0.0.0:9094
    advertise_address: ""
    peers: cortex-aggregator-v2-alertmanager-headless.cortex-2.svc.cluster.local:9094
    peer_timeout: 15s
    gossip_interval: 200ms
    push_pull_interval: 1m0s

Ruler Logs

level=error ts=2022-09-13T16:37:20.663333971Z caller=notifier.go:527 user=something-else-for-testing alertmanager=http://cortex-aggregator-v2-alertmanager-0.cortex-aggregator-v2-alertmanager-headless.cortex-2.svc.cluster.local:9094/api/prom/alertmanager/api/v1/alerts count=1 msg="Error sending alert" err="Post \"http://cortex-aggregator-v2-alertmanager-0.cortex-aggregator-v2-alertmanager-headless.cortex-2.svc.cluster.local:9094/api/prom/alertmanager/api/v1/alerts\": EOF"

AM Logs

level=debug ts=2022-09-13T16:33:23.849663134Z caller=cluster.go:329 component=cluster memberlist="2022/09/13 16:33:23 [ERR] memberlist: Received invalid msgType (80) from=10.115.16.92:58804\n"

Expected behavior Ruler should send notifications to all alertmanager pods

Environment: Cortex is deployed using the cortex helm chart v0.7.0. The version of cortex is 1.10.0

Additional Context I tried making a curl request from one of the ruler pods

curl -iv -XPOST http://cortex-aggregator-v2-alertmanager-0.cortex-aggregator-v2-alertmanager-headless.cortex-2.svc.cluster.local:9094/api/prom/alertmanager/api/v1/alerts

and got an empty response

curl: (52) Empty reply from server

AM had similar logs of invalid msgType (80)

Can I get some pointers on how to proceed here?

alvinlin123 commented 2 years ago

👋 Hi @roobalimsab are you in the Cortex slack channel?. Many Cortex users may be able to help you out more effectively in the Slack channel :)

curl: (52) Empty reply from server is a generic error when server closes underlying connection without sending a response. One thing I would check is if a GET request returns anything; if you run curl -iv -XGET http://cortex-aggregator-v2-alertmanager-0.cortex-aggregator-v2-alertmanager-headless.cortex-2.svc.cluster.local:9094/api/prom/alertmanager/api/v1/alerts

This issue seems more like a networking issue than Cortex issue :)

Also would it be possible for you to try to upgrade to latest Cortex version 1.13.1 and see if the problem persists? There are some known member list issue before 1.13.0.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.