canonical / prometheus-scrape-config-k8s-operator

This charmed operator allows operators to fine-tune scrape job configurations before sending them to the Prometheus charmed operator.
https://charmhub.io/prometheus-scrape-config-k8s
Apache License 2.0
1 stars 1 forks source link

No error while setting `scrape_interval` to an empty string #31

Open Abuelodelanada opened 11 months ago

Abuelodelanada commented 11 months ago

Bug Description

Setting and empty string to scrape_interval config only produce an ERROR in debug-log but not in the juju status nor in the command line. Besides removes part of the prometheus config without alerting.

To Reproduce

  1. Deploy this bundle:

    bundle: kubernetes
    applications:
    prometheus:
    charm: prometheus-k8s
    channel: edge
    revision: 150
    series: focal
    resources:
      prometheus-image: 128
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
    scrape-config:
    charm: prometheus-scrape-config-k8s
    channel: edge
    revision: 42
    series: focal
    scale: 1
    constraints: arch=amd64
    zinc:
    charm: zinc-k8s
    channel: edge
    revision: 124
    resources:
      zinc-image: 120
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    relations:
    - - prometheus:metrics-endpoint
    - scrape-config:metrics-endpoint
    - - zinc:metrics-endpoint
    - scrape-config:configurable-scrape-jobs
  2. Check that zinc scrape jobs landed at prometheus:

$ juju ssh --container prometheus prometheus/0 cat /etc/prometheus/prometheus.yml
global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s
rule_files:
- /etc/prometheus/rules/juju_*.rules
scrape_configs:
- honor_timestamps: true
  job_name: prometheus
  metrics_path: /metrics
  relabel_configs:
  - regex: (.*)
    separator: _
    source_labels:
    - juju_model
    - juju_model_uuid
    - juju_application
    - juju_unit
    target_label: instance
  scheme: http
  scrape_interval: 5s
  scrape_timeout: 5s
  static_configs:
  - labels:
      host: localhost
      juju_application: prometheus
      juju_model: cos
      juju_model_uuid: 803f1f9b-8e3c-4414-8e9a-07966d834767
      juju_unit: prometheus-k8s
    targets:
    - prometheus-0.prometheus-endpoints.cos.svc.cluster.local:9090
- honor_labels: true
  job_name: juju_cos_803f1f9b_zinc_prometheus_scrape-0
  metrics_path: /metrics
  relabel_configs:
  - regex: (.*)
    separator: _
    source_labels:
    - juju_model
    - juju_model_uuid
    - juju_application
    - juju_unit
    target_label: instance
  static_configs:
  - labels:
      juju_application: zinc
      juju_charm: zinc-k8s
      juju_model: cos
      juju_model_uuid: 803f1f9b-8e3c-4414-8e9a-07966d834767
      juju_unit: zinc/0
    targets:
    - 10.1.38.114:4080
  1. Change scrape_interval: juju config scrape-config scrape_interval="12s"
  2. Check this config landed at prometheus config:
$ juju ssh --container prometheus prometheus/0 cat /etc/prometheus/prometheus.yml | grep "scrape_interval: 12s"
  scrape_interval: 12s
  1. Set an empty string to that config: $ juju config scrape-config scrape_interval=""

  2. Check the command return no error:

    $ echo $?
    0
  3. Check that zinc scrape job has gone:

$ juju ssh --container prometheus prometheus/0 cat /etc/prometheus/prometheus.yml                                                       
global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s
rule_files:
- /etc/prometheus/rules/juju_*.rules
scrape_configs:
- honor_timestamps: true
  job_name: prometheus
  metrics_path: /metrics
  relabel_configs:
  - regex: (.*)
    separator: _
    source_labels:
    - juju_model
    - juju_model_uuid
    - juju_application
    - juju_unit
    target_label: instance
  scheme: http
  scrape_interval: 5s
  scrape_timeout: 5s
  static_configs:
  - labels:
      host: localhost
      juju_application: prometheus
      juju_model: cos
      juju_model_uuid: 803f1f9b-8e3c-4414-8e9a-07966d834767
      juju_unit: prometheus-k8s
    targets:
    - prometheus-0.prometheus-endpoints.cos.svc.cluster.local:9090

Environment

Relevant log output

unit-prometheus-0: 17:33:47.794 INFO unit.prometheus/0.juju-log metrics-endpoint:2: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos/pods/prometheus-0 "HTTP/1.1 200 OK"
unit-prometheus-0: 17:33:47.835 ERROR unit.prometheus/0.juju-log metrics-endpoint:2: Validating scrape jobs failed: b'time="2023-10-02T20:33:47Z" level=fatal msg="parsing YAML file /tmp/tmpneye6lpy: empty duration string"\n'
unit-prometheus-0: 17:33:47.868 INFO unit.prometheus/0.juju-log metrics-endpoint:2: Pushed new configuration
unit-prometheus-0: 17:33:47.928 INFO unit.prometheus/0.juju-log metrics-endpoint:2: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos/pods/prometheus-0 "HTTP/1.1 200 OK"

Additional context

No response

dstathis commented 9 months ago

There are to things to consider here. In prometheus-scrape-config we should validate input and block if the data is wrong.

Also, if Prometheus receives a bad scrape config, it should skip that job and continue working. Which it does. An open question is if it should set a blocked status to notify the user that something is wrong.