StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
69 stars 12 forks source link

Investigate Alerting: Implement Alerting for ElasticSearch in Dev #1318

Closed Jose-Matsuda closed 2 years ago

Jose-Matsuda commented 2 years ago

Current Status 20/09/2022

Taken off of Pat's comment

Steps

Need to make changes in the values here, at this alertmanager probably

Possible other issues to read through https://github.com/prometheus-community/helm-charts/issues/393

Important

Jose-Matsuda commented 2 years ago

Port forwarding alertmanager and navigating to http://localhost:9093/#/status gives me the current config, need to find out where this is done / how I can change it say via argocd.

Though note that it does say image

Can maybe use a configmap?

Could possibly do it

https://github.com/prometheus-community/helm-charts/blob/dd70d54ea0cef913140a78b918afac88d7c8ef2e/charts/kube-prometheus-stack/values.yaml#L485

Note that it is mounted here (in the alertmanager pod) image

is controlled by image

and is populated with the values image

which is the default from the chart

Jose-Matsuda commented 2 years ago

Prometheus Rules Note that some already exist in the volume image

In the PR below I was able to get our own test Prometheus Rules to be recognized and in there

Jose-Matsuda commented 2 years ago

Pat has graciously directed me to the following repos on how we end up with Prometheus etc in the cluster.

We have the specific terraform-kubernetes-kube-prometheus-stack to install the stack, which is referenced by the generic terraform-statcan-kubernetes-core-platform which is referenced by our specific repo terraform-statcan-aaw-platform for our own clusters

Jose-Matsuda commented 2 years ago

Passing along our AlertManager configuration.

Pat also gave more insight, saying that similar to how we 'custom' set the disk space for Prometheus, we will probably need to make a variable to pass down the chain. First at the statcan-aaw repo we just need to declare and then make the TF_VAR in the git secrets.

This has to get passed down to now the core-platform.

Having said that, if I take a look at chart I can see this. Maybe we can get away with configuring this, and then argocd side we can make changes to the configmap as we see fit? Though I am unsure about what happens if we change the configmap while its going.

^ This almost makes sense if I was going along with what I knew earlier, but a little bit further down below you see configSecret which matches the pictures I have above in terms of location. The "regular" configuration seems to be taken from here, which populates the secret --> the problem with this is that it really only uses .Values.alertmanager.config and nothing else.

configSecret

I think we can use this, and if that is the case we do not need to do much variable passing (just need to enable the option) as we ourselves can control the config via argocd with a secret, would just need to restart alertmanager when it is updated(?). TODO:

Jose-Matsuda commented 2 years ago

Relevant Information Regarding Versions

(as of 12/09/2022)

DEV references terraform-statcan-aww-platform v3.7.0, which references terraform-statcan-kubernetes-core-platform v1.7.0, which does reference the terraform-kubernetes-kube-prometheus-stack v2.0.0 (not much here, just focus on the k8s-core-platform as that contains the actual values)

Checking this was important to make sure that we were not missing any key upgrades.

Jose-Matsuda commented 2 years ago

If we go the secret route

We will likely want create and make the secret in our various TF file like in image

Where data would be something like image

We would need to base64 encode whatever configuration we want and then put that as a secret in the repo. Remember that the configuration needs to match what they specify in the docs, (ala that alertmanager.yaml in /etc/alertmanager/config

Jose-Matsuda commented 2 years ago

Actual Configuration to Use

In this gist, format / look of this may change depending on if we go with modifying the secret then we will need to encode it etc.

Jose-Matsuda commented 2 years ago

Closing, will create a new thing to track CNS