VictoriaMetrics / helm-charts

Helm charts for VictoriaMetrics, VictoriaLogs and ecosystem
https://victoriametrics.github.io/helm-charts/
Apache License 2.0
344 stars 331 forks source link

bug: victoria-metrics-agent v0.14.7 incorrect args passed (incorrect order/value) #1754

Closed vhajdukd closed 1 day ago

vhajdukd commented 5 days ago

Chart name and version chart: victoria-metrics-agent version: v0.14.7

Describe the bug When defining specific remoteWrite.url overrides / properties - those are appended to the global section instead of retaining the order of properties. This is a groundbreaking issue when working with multiple remoteWrites in a single configuration.

Example at the bottom.

Custom values Please provide only custom values (excluding default ones):

image:
  tag: v1.105.0
fullnameOverride: vmagent-general

deployment:
  enabled: false
statefulset:
  enabled: true

persistence:
  enabled: true
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  size: 4Gi

serviceAccount:
  create: true

service:
  enabled: true
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8429'
  labels: {}
  type: NodePort

resources:
  limits:
    cpu: 1750m
    memory: 1.5Gi
  requests:
    cpu: 1200m
    memory: 1Gi
    ephemeral-storage: 4Gi

rbac:
  create: true
  pspEnabled: true

podDisruptionBudget:
  enabled: true
  minAvailable: 40%
  selector:
    matchLabels:
      app.kubernetes.io/instance: vmagent-general

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/instance: vmagent-general
          topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app.kubernetes.io/instance: vmagent-general

remoteWrite:
  - url: http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/general_relabel.yaml
  - url: http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/jailhouse_relabel.yaml
    showURL: true
    disableOnDiskQueue: true
    dropSamplesOnOverload: true

extraArgs:
  promscrape.streamParse: true
  promscrape.suppressScrapeErrors: true
  promscrape.suppressDuplicateScrapeTargetErrors: true
  loggerErrorsPerSecondLimit: 20
  promscrape.discovery.concurrency: 200
  promscrape.maxDroppedTargets: 100000
  influxSkipSingleField: true
  usePromCompatibleNaming: true
  remoteWrite.queues: 64
  maxConcurrentInserts: 4

extraVolumes:
  - name: relabel-config
    configMap:
      name: vmagent-relabeling
extraVolumeMounts:
  - name: relabel-config
    mountPath: /relabel_configs

config:
  global:
    scrape_interval: 30s
    scrape_timeout: 5s
  scrape_configs: []

Broken config result from the chart:

          args: 
            - --envflag.enable=true
            - --envflag.prefix=VM_
            - --influxSkipSingleField
            - --loggerErrorsPerSecondLimit=20
            - --loggerFormat=json
            - --maxConcurrentInserts=4
            - --promscrape.config=/config/scrape/scrape.yml
            - --promscrape.discovery.concurrency=200
            - --promscrape.maxDroppedTargets=100000
            - --promscrape.streamParse
            - --promscrape.suppressDuplicateScrapeTargetErrors
            - --promscrape.suppressScrapeErrors
            - --remoteWrite.disableOnDiskQueue
            - --remoteWrite.dropSamplesOnOverload
            - --remoteWrite.queues=64
            - --remoteWrite.showURL
            - --remoteWrite.tmpDataPath=/tmpData
            - --remoteWrite.url=http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.url=http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.urlRelabelConfig=/relabel_configs/general_relabel.yaml
            - --remoteWrite.urlRelabelConfig=/relabel_configs/jailhouse_relabel.yaml
            - --usePromCompatibleNaming

Expected config from the chart:

          args: 
            - --envflag.enable=true
            - --envflag.prefix=VM_
            - --influxSkipSingleField
            - --loggerErrorsPerSecondLimit=20
            - --loggerFormat=json
            - --maxConcurrentInserts=4
            - --promscrape.config=/config/scrape/scrape.yml
            - --promscrape.discovery.concurrency=200
            - --promscrape.maxDroppedTargets=100000
            - --promscrape.streamParse
            - --promscrape.suppressDuplicateScrapeTargetErrors
            - --promscrape.suppressScrapeErrors
            - --remoteWrite.queues=64
            - --remoteWrite.tmpDataPath=/tmpData
            - --remoteWrite.url=http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.urlRelabelConfig=/relabel_configs/general_relabel.yaml
            - --remoteWrite.url=http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.disableOnDiskQueue
            - --remoteWrite.dropSamplesOnOverload
            - --remoteWrite.showURL
            - --remoteWrite.urlRelabelConfig=/relabel_configs/jailhouse_relabel.yaml
            - --usePromCompatibleNaming

Order matters in this case.

AndrewChubatiuk commented 5 days ago

try updating your config to

remoteWrite:
  - url: http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/general_relabel.yaml
    showURL: false
    disableOnDiskQueue: false
    dropSamplesOnOverload: false
  - url: http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/jailhouse_relabel.yaml
    showURL: true
    disableOnDiskQueue: true
    dropSamplesOnOverload: true

in what you called "broken result" order the is correct,

- --remoteWrite.url=http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
- --remoteWrite.url=http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
- --remoteWrite.urlRelabelConfig=/relabel_configs/general_relabel.yaml
- --remoteWrite.urlRelabelConfig=/relabel_configs/jailhouse_relabel.yaml

1st remoteWrite.url - http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write has relabel config 1st remoteWrite.urlRelabelConfig - /relabel_configs/general_relabel.yaml 2nd remoteWrite.url - http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write has relabel config 2nd remoteWrite.urlRelabelConfig - /relabel_configs/jailhouse_relabel.yaml

problem there is that showURL: true, disableOnDiskQueue: true and dropSamplesOnOverload: true is applied to both remote writes as each if these flags is passed only once

vhajdukd commented 5 days ago

Hi,

This is the result for:

remoteWrite:
  - url: http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/general_relabel.yaml
    showURL: false
    disableOnDiskQueue: false
    dropSamplesOnOverload: false
  - url: http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/jailhouse_relabel.yaml
    showURL: true
    disableOnDiskQueue: true
    dropSamplesOnOverload: true

result:

          args: 
            - --envflag.enable=true
            - --envflag.prefix=VM_
            - --influxSkipSingleField
            - --loggerErrorsPerSecondLimit=20
            - --loggerFormat=json
            - --maxConcurrentInserts=4
            - --promscrape.config=/config/scrape/scrape.yml
            - --promscrape.discovery.concurrency=200
            - --promscrape.maxDroppedTargets=100000
            - --promscrape.streamParse
            - --promscrape.suppressDuplicateScrapeTargetErrors
            - --promscrape.suppressScrapeErrors
            - remoteWrite.disableOnDiskQueue
            - --remoteWrite.disableOnDiskQueue
            - remoteWrite.dropSamplesOnOverload
            - --remoteWrite.dropSamplesOnOverload
            - --remoteWrite.queues=64
            - remoteWrite.showURL
            - --remoteWrite.showURL
            - --remoteWrite.tmpDataPath=/tmpData
            - --remoteWrite.url=http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.url=http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.urlRelabelConfig=/relabel_configs/general_relabel.yaml
            - --remoteWrite.urlRelabelConfig=/relabel_configs/jailhouse_relabel.yaml
            - --usePromCompatibleNaming

Which seems to be completely wrong as params are not even passed as arguments.

AndrewChubatiuk commented 5 days ago

this issue was fixed in common module and is available in release 0.14.7 testing installation of vmagent chart v0.14.7 and it works as expected

args: 
            - --envflag.enable=true
            - --envflag.prefix=VM_
            - --loggerFormat=json 
            - --promscrape.config=/config/scrape/scrape.yml
            - --remoteWrite.disableOnDiskQueue=false
            - --remoteWrite.disableOnDiskQueue
            - --remoteWrite.dropSamplesOnOverload=false
            - --remoteWrite.dropSamplesOnOverload
            - --remoteWrite.showURL=false
            - --remoteWrite.showURL
            - --remoteWrite.tmpDataPath=/tmpData
            - --remoteWrite.url=http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.url=http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
            - --remoteWrite.urlRelabelConfig=/relabel_configs/general_relabel.yaml
            - --remoteWrite.urlRelabelConfig=/relabel_configs/jailhouse_relabel.yaml

Also it's possible to pass a plain yaml in remoteWrite.urlRelabelConfig, which will be converted to configmap with volumeMount sections for it

remoteWrite:
  - url: http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig:
      - action: keep
        source_labels: [env]
        regex: "dev"
    showURL: false
    disableOnDiskQueue: false
    dropSamplesOnOverload: false
  - url: http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig:
      - action: keep
        source_labels: [env]
        regex: "dev"
    showURL: true
    disableOnDiskQueue: true
    dropSamplesOnOverload: true
AndrewChubatiuk commented 3 days ago

hey @vhajdukd have you checked a version of chart's common dependency?

vhajdukd commented 1 day ago

I've run this again 0.14.4 vs 0.14.7 and both generate this:

image

ps. There seems to be also an issue with image.Tag override:

image

Chart.AppVersion will always take precedence over any image overrides :D..

vhajdukd commented 1 day ago

Example values.yaml:

image:
  tag: v1.105.0
fullnameOverride: vmagent-general

# 14.4 vs 14.7
deployment:
  enabled: false
statefulset:
  enabled: true
statefulSet:
  enabled: true

persistence:
  enabled: true
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  size: 4Gi

serviceAccount:
  create: true

service:
  enabled: true
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8429'
  labels: {}
  type: NodePort

resources:
  limits:
    cpu: 1750m
    memory: 1.5Gi
  requests:
    cpu: 1200m
    memory: 1Gi
    ephemeral-storage: 4Gi

rbac:
  create: true
  pspEnabled: true

podDisruptionBudget:
  enabled: true
  minAvailable: 40%
  selector:
    matchLabels:
      app.kubernetes.io/instance: vmagent-general

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/instance: vmagent-general
          topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app.kubernetes.io/instance: vmagent-general

remoteWrite:
  - url: http://vminsert-general.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/general_relabel.yaml
    disableOnDiskQueue: false
  - url: http://vminsert-jailhouse.monitoring.svc.cluster.local/insert/0/prometheus/api/v1/write
    urlRelabelConfig: /relabel_configs/jailhouse_relabel.yaml
    disableOnDiskQueue: true

extraArgs:
  remoteWrite.showURL: true
  remoteWrite.dropSamplesOnOverload: true
  promscrape.streamParse: true
  promscrape.suppressScrapeErrors: true
  promscrape.suppressDuplicateScrapeTargetErrors: true
  loggerErrorsPerSecondLimit: 20
  promscrape.discovery.concurrency: 200
  promscrape.maxDroppedTargets: 100000
  influxSkipSingleField: true
  usePromCompatibleNaming: true
  remoteWrite.queues: 64
  maxConcurrentInserts: 4

extraVolumes:
  - name: relabel-config
    configMap:
      name: vmagent-relabeling
extraVolumeMounts:
  - name: relabel-config
    mountPath: /relabel_configs

config:
  global:
    scrape_interval: 30s
    scrape_timeout: 5s
  scrape_configs: []
AndrewChubatiuk commented 1 day ago

do you run helm dep build after checkout?

vhajdukd commented 1 day ago

In fact I did not :D.. what a blunder let me rerender.

vhajdukd commented 1 day ago

Yea that did the trick:

image

Tho the issue with app.kubernetes.io/version is still there. I'll close this one and open a different one for version label.