GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
191 stars 87 forks source link

way to disable Managed Alertmanager? #355

Open parkedwards opened 1 year ago

parkedwards commented 1 year ago

hello there - if we opt to self-deploy Alertmanager with GMP, is there a way to disable the automatically created alertmanager deployment / service?

https://cloud.google.com/stackdriver/docs/managed-prometheus/rules-managed#self-deployed_alertmanager

damemi commented 1 year ago

Hi @parkedwards, by default the managed Alertmanager doesn't do anything or interfere with self-deployed Alertmanagers. So simply by not configuring it, it is essentially "disabled". If you want to go a step further, you could try scaling the MAM statefulset to 0 replicas, but this may only work if you are running GMP unmanaged.

For our reference, could you share why you need to disable the managed Alertmanager? Thanks

parkedwards commented 1 year ago

@damemi makes sense, we'll leave the MAM set un-configured

For our reference, could you share why you need to disable the managed Alertmanager? Thanks

Sure thing, so we're opting to self deploy our Alertmanager instances, but otherwise still used the managed GMP components (collectors, rule-evaluator, etc.). Ideally, we wouldn't be running any other Alertmanager deployment or statefulset (eg. the Managed ones), just to conserve on resource usage and reduce confusion for anyone else on the team

robmonct commented 1 year ago

It would be nice to be able to disable managed AlertManager to avoid confusion.

djfinnoy commented 1 year ago

We just upgraded to v0.5.0, and spent half a hour figuring out a way to disable the managed alertmanager.

Why?

How?

We have to deploy GMP via the manifests, the addon installation isn't flexible enough for our needs (we need istio-sidecars, and we don't allow manual configuration through kubectl: everything should be code). Since there isn't a Helm chart for GMP, we have to use kustomize. It's fairly straightforward to configure GMP to suit your needs via kustomizations:

# ./kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.5.0/manifests/setup.yaml
  - https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.5.0/manifests/operator.yaml

patchesStrategicMerge:
  - delete-google-managed-alertmanager.yaml

patches:
  - target:
      name: config
      kind: OperatorConfig
    patch: |-
      # Connect GMP with our self-managed alertmanager 
      - op: add
        path: /rules
        value:
          alerting:
            alertmanagers:
            - name: alertmanager
              namespace: monitoring
              port: 9093

  - target:
      name: gmp-system
      kind: Namespace
    patch: |-
      # Add Istio sidecars to the GMP so Kiali graphs make sense
      - op: add
        path: /metadata/labels
        value:
          istio-injection: enabled

---

# ./delete-google-managed-alertmanager.yaml

$patch: delete
apiVersion: v1
kind: Service
metadata:
  namespace: gmp-system
  name: alertmanager
---
$patch: delete
apiVersion: v1
kind: Secret
metadata:
  namespace: gmp-system
  name: alertmanager
---
$patch: delete
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: gmp-system
  name: alertmanager
bwplotka commented 1 year ago

Nice, and your workaround makes sense! Also you are welcome to use newer version of operator (e.g. gke.gcr.io/prometheus-engine/operator:v0.6.3-gke.0 and v0.6.3-rc.0 tag).

We will think with the team if there is a way for us to have easier way of disabling AM for your use cases.

arthurburle commented 11 months ago

Hi @bwplotka, Do we have any updates on this topic?

bwplotka commented 11 months ago

No discussion yet, sorry for lag.

From my understanding this feature only applies to managed GMP.

One way of solving it is a new field e.g. "OperatorConfig.ManagedAlertmanagerSpec.Disabled = true that would change Alertmanager replica to 0 from 1.

The addition work on our side is to fix alerting for this case (for managed GMP we have solid SLOs).

Before we prioritise this we would love to know what's the confusion argument that was meant here. Sounds like it's the only reason for this feature. The confusion comes from "alertmanager" named pod running in some system namespace (it's filtered out by default though) when listing pods. Or is it some other source of confusion?

bwplotka commented 9 months ago

Note: @bernot-dev works on this feature (automatic disabling of AM and rule-eval if no configuration is used for those) 🤗

bernot-dev commented 7 months ago

Solution implemented in #691. Rule-evaluator and alertmanager will scale to zero when there are no Rules set up in the cluster.

robmonct commented 7 months ago

Hi team, Thanks for your work, but I have to say that the solution implemented doesn't have a lot of sense, to me. The purpose of the issue, if I'm not wrong, is to use a self-deployed alertmanager, so there will be rules. With this solution, we will have the same problem. In my opinion the solution should be just especify a way to enable or disable, independently of the amount of rules.

bernot-dev commented 7 months ago

To be more specific, it scales the GMP Rule-evaluator Deployment and Alertmanager StatefulSet to zero if none of these custom resources exist:

monitoring.googleapis.com/ClusterRules
monitoring.googleapis.com/GlobalRules
monitoring.googleapis.com/Rules

The primary goal of #691 was saving resources when the user does not need those pods running. If a user wants their own self-deployed Alertmanager, the GMP Alertmanager should not interfere unless it is also using our specific custom resources.

parkedwards commented 6 months ago

well to @robmonct's point, which mirrors our use case -- we're using all of the Managed Prometheus components (rule evaluator, collector, etc.) which includes usage of the Rule CRDs. We just want to use our own Alertmanager instance

m3adow commented 2 months ago

+1 for a proper solution. The managed alertmanager isn't fitting for us anymore, as we require an AlertmanagerConfig pendant for our different application teams (but also due to #685). Nevertheless, we want to use the remaining GMP parts like collectors, rule-evaluator, etc.

The managed alertmanager is just eating cluster resources. While it's not a lot, I'd prefer to not have useless pods in our clusters.

pintohutch commented 2 months ago

Hey @m3adow - I'll re-open this issue so we can discuss as a team how we want to address and prioritize this.