SigNoz / charts

Helm Charts for SigNoz
MIT License
74 stars 76 forks source link

Support multiple alert manager replicas #514

Open BryanFauble opened 1 week ago

BryanFauble commented 1 week ago

In this helm chart it looks like 1 url is ever used/considered: https://github.com/SigNoz/charts/blob/bb53857ff7deff14b779e18475bec32b790327ea/charts/signoz/templates/_helpers.tpl#L298

Ask: When running multiple replicas of alert mamager (https://github.com/SigNoz/alertmanager) allow for any alerts to be sent to 1 or more configured replicas.

Taken from the Readme of the signoz fork of alertmanager it supports the overall ask of this issue, it shows how it expects prometheus to point at multiple alert manager instances:

To point your Prometheus 1.4, or later, instance to multiple Alertmanagers, configure them in your prometheus.yml configuration file, for example:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager1:9093
      - alertmanager2:9093
      - alertmanager3:9093

Important: Do not load balance traffic between Prometheus and its Alertmanagers, but instead point Prometheus to a list of all Alertmanagers. The Alertmanager implementation expects all alerts to be sent to all Alertmanagers to ensure high availability.

Here are more links to code where changes are likely needed:

  1. https://github.com/SigNoz/charts/blob/bb53857ff7deff14b779e18475bec32b790327ea/charts/signoz/templates/query-service/statefulset.yaml#L134
  2. https://github.com/SigNoz/charts/blob/bb53857ff7deff14b779e18475bec32b790327ea/charts/signoz/templates/query-service/configmap.yaml#L18
grandwizard28 commented 4 days ago

Hi @BryanFauble, Thank you for opening this issue!

Would you be willing to contribute a PR for this?

BryanFauble commented 4 days ago

Thanks @grandwizard28 ,

This is something that gives a trivial benefit so I don't want to spend time on it for now.

Something of a larger benefit that I started to look at was starting to consider how k8s secrets might be used to configure things like the clickhouse password. However, there was some more work to consider how stuff like the traces url could also be fed in as it includes the username/password as well:

  1. https://github.com/SigNoz/charts/compare/main...BryanFauble:charts-signoz:main#
  2. https://github.com/BryanFauble/charts-signoz/blob/7a29318aabc8bccab36a4db3aaf3bf7e71fb27df/charts/signoz/templates/_clickhouse.tpl#L234

The secretFrom stanza was something that the underlying clickhouse implementation is doing: https://altinity.com/blog/clickhouse-confidential-using-kubernetes-secrets-with-the-altinity-operator

grandwizard28 commented 4 days ago

Let's discuss this on a separate issue: https://github.com/SigNoz/charts/issues/525.

I'd like to keep this open for supporting multiple alert managers :)