VictoriaMetrics / operator

Kubernetes operator for Victoria Metrics
Apache License 2.0
433 stars 146 forks source link

Config merge of VMAlertManagerConfig and config secret not working correctly #339

Closed Katnopic closed 3 years ago

Katnopic commented 3 years ago

Hello,

I've been trying to configure VMAlertManager with VMAlertManagerConfig. i want to map a custom template to my configuration, and because i've seen it's currently not supported in VMAlertManagerConfig object, i need to configure a config secret and add it there, and as stated by the documentation, the configurations will be merged. but i seem to fail to do so, and the merge adds some unwanted content that causes errors for me.

I am using victoria-metrics-operator installed through helm, chart version 0.3.0, app version 0.19.0, default values.

here are the relevant files:

VMAlertManagerConfig:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanagerConfig
metadata:
  name: slack
  namespace: victoria-metrics
  labels:
    alertconfig: slack
spec:
  receivers:
  - name: slack
    slack_configs:
    - actions:
      - type: button
        text: 'Silence :no_bell:'
        url: '{{ template "__alert_silence_link" . }}'
      - type: button
        text: 'Query :mag:'
        url: '{{ (index .Alerts 0).GeneratorURL }}'
      api_url:
        name: slack
        key: apiURL
      channel: REDACTED
      color: '{{ template "slack.REDACTED.color" . }}'
      send_resolved: true
      text: '{{ template "slack.REDACTED.text" . }}'
      title: '{{ template "slack.REDACTED.title" . }}'
  route:
    receiver: slack
    #repeatInterval: 5m

the secret named 'slack' with apiURL is populated with a valid URL to my slack webhook

Config secret file:

apiVersion: v1
data:
  alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0Kcm91dGU6CiAgZ3JvdXBfYnk6IFsnYWxlcnRuYW1lJ10KICBncm91cF93YWl0OiAzMHMKICBncm91cF9pbnRlcnZhbDogNW0KICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJlY2VpdmVyOiAnbnVsbCcKICByb3V0ZXM6CiAgLSBtYXRjaDoKICAgICAgYWxlcnRuYW1lOiBXYXRjaGRvZwogICAgcmVjZWl2ZXI6ICdudWxsJwogICAgY29udGludWU6IGZhbHNlCnJlY2VpdmVyczoKLSBuYW1lOiAnbnVsbCcKdGVtcGxhdGVzOgotICcvZXRjL2FsZXJ0bWFuYWdlci9jb25maWcvKi50bXBsJwotICcvZXRjL2FsZXJ0bWFuYWdlci9jb25maWdtYXBzLyoqLyoudG1wbCcKLSAnL2V0Yy92bS9jb25maWdzL3RlbXBsYXRlcy8qLnRtcGwn
kind: Secret
metadata:
  name: alertmanager-config
  namespace: victoria-metrics
type: Opaque

VMAlertManager:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
metadata:
  name: alertmanager
  namespace: victoria-metrics
spec:
  replicaCount: 1
  configMaps:
  - templates 
  configSecret: alertmanager-config
  configSelector:
    matchExpressions:
      - key: alertconfig
        operator: Exists
  externalURL: REDACTED

after applying all of the above, the alertmanager-config secret object changes to this:

apiVersion: v1
data:
  alertmanager.yaml: REDACTED-VIEW-CONTENT-BELOW
kind: Secret
metadata:
  creationTimestamp: "2021-09-30T14:51:27Z"
  finalizers:
  - apps.victoriametrics.com/finalizer
  labels:
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/instance: alertmanager
    app.kubernetes.io/name: vmalertmanager
    managed-by: vm-operator
  name: alertmanager-config
  namespace: victoria-metrics
  ownerReferences:
  - apiVersion: operator.victoriametrics.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: VMAlertmanager
    name: alertmanager
    uid: e531970f-8551-46a8-89af-5ea239c05eb7
  resourceVersion: "15798926"
  uid: 505ee893-5895-4022-bdaf-8b9a40252bb1
type: Opaque

and the content of the secret is as follows:

global:
  resolve_timeout: 5m
route:
  receiver: webhook
  routes:
  - matchers:
    - namespace = "victoria-metrics"
    receiver: victoria-metrics-slack-slack
    continue: true
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
receivers:
- name: webhook
  webhook_configs:
  - url: http://localhost:30500/
- name: victoria-metrics-slack-slack
  slack_configs:
  - api_url: REDACTED
    send_resolved: true
templates: []

as you can see, the slack config is being added, but the templates don't, and for some reason i get a webhook_configs section that causes errors in alert-manager:

level=warn ts=2021-09-30T15:07:15.642Z caller=notify.go:723 component=dispatcher receiver=webhook integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"http://localhost:30500/\": dial tcp 127.0.0.1:30500: connect: connection refused"

Would really like help with this one to get it to work. Also, 2 quick questions regarding VMAlertManager object:

Thanks alot!

f41gh7 commented 3 years ago

Hello, it's a bug, slack_config wasn't rendered correctly. It must be fixed at related PR.

Docker image version based on v0.19.1 release - victoriametrics/operator:fixes-alertmanager-config can be used for testing.

Katnopic commented 3 years ago

@f41gh7 Hi, thanks for the update i've tested the image, it did add the slack config to the file, but there is still the webhook section that't being added and is causing errors, and also the configurations did not get merged (the vmalertmanagerconfig and the secret)...

this is the new configuration now inside the pod:

global:
  resolve_timeout: 5m
route:
  receiver: webhook
  routes:
  - matchers:
    - namespace = "victoria-metrics"
    receiver: victoria-metrics-slack-slack
    continue: true
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
receivers:
- name: webhook
  webhook_configs:
  - url: http://localhost:30500/
- name: victoria-metrics-slack-slack
  slack_configs:
  - api_url: <redacted>
    send_resolved: true
    channel: <redacted>
    color: '{{ template "slack.<redacted>.color" . }}'
    text: '{{ template "slack.<redacted>.text" . }}'
    title: '{{ template "slack.<redacted>.title" . }}'
    actions:
    - text: 'Silence :no_bell:'
      url: '{{ template "__alert_silence_link" . }}'
      type: button
    - text: 'Query :mag:'
      url: '{{ (index .Alerts 0).GeneratorURL }}'
      type: button
templates: []
f41gh7 commented 3 years ago

Thanks for testing, indeed, there was another issue with it. It must be fixed at commit a4d884a1bec13ff188d45e902a3b6736224e37ae. Related docker image - victoriametrics/operator:gh-339

Those changes will be included into the next release

Katnopic commented 3 years ago

@f41gh7 Thanks alot! will test this soon and update

f41gh7 commented 3 years ago

Fix added to the release version with v0.20.0.

Sorry for delay with release