canonical / grafana-agent-operator

https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 11 forks source link

Traces downsampling policy #191

Closed mmkay closed 1 month ago

mmkay commented 1 month ago

Issue

Grafana-agent charm passes all traces to the tracing backend as it doesn't have any downsampling configuration.

Solution

Add a sampling policy with three config variables, setting up sampling strategies for charm traces, workload traces and errors.

Context

Grafana agent uses tail sampling processor from opentelemetry-collector, its reference is here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.96.0/processor/tailsamplingprocessor

Tandem PR: https://github.com/canonical/grafana-agent-k8s-operator/pull/319

Testing Instructions

Run the following bundle in a machine model, using mysql-operator from this branch:

default-base: ubuntu@22.04/stable
saas:
  prometheus-receive-remote-write:
    url: microk8s:admin/welcome-k8s.prometheus-receive-remote-write
  tracing:
    url: microk8s:admin/welcome-k8s.tracing
applications:
  grafana-agent:
    charm: local:grafana-agent-0
    options:
      charm_traces_sampling_percentage: 100
  mysql:
    charm: local:mysql-0
    num_units: 1
    to:
    - "0"
    constraints: arch=amd64
    storage:
      database: rootfs,1,1024M
machines:
  "0":
    constraints: arch=amd64
relations:
- - grafana-agent:send-remote-write
  - prometheus-receive-remote-write:receive-remote-write
- - grafana-agent:tracing
  - tracing:tracing
- - mysql:cos-agent
  - grafana-agent:cos-agent

In another k8s model, deploy cos-lite + tempo:

bundle: kubernetes
saas:
  remote-f33b3981087c439185dc9e6c2cbb47d8: {}
applications:
  alertmanager:
    charm: alertmanager-k8s
    channel: latest/edge
    revision: 135
    base: ubuntu@20.04/stable
    resources:
      alertmanager-image: 98
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    trust: true
  catalogue:
    charm: catalogue-k8s
    channel: latest/edge
    revision: 63
    base: ubuntu@20.04/stable
    resources:
      catalogue-image: 34
    scale: 1
    options:
      description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight,
        highly-integrated, \nJuju-based observability suite running on Kubernetes.\n"
      tagline: Model-driven Observability Stack deployed with a single command.
      title: Canonical Observability Stack
    constraints: arch=amd64
    trust: true
  grafana:
    charm: grafana-k8s
    channel: latest/edge
    revision: 119
    base: ubuntu@20.04/stable
    resources:
      grafana-image: 70
      litestream-image: 45
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  loki:
    charm: loki-k8s
    channel: latest/edge
    revision: 171
    base: ubuntu@20.04/stable
    resources:
      loki-image: 100
      node-exporter-image: 3
    scale: 1
    constraints: arch=amd64
    storage:
      active-index-directory: kubernetes,1,1024M
      loki-chunks: kubernetes,1,1024M
    trust: true
  prometheus:
    charm: prometheus-k8s
    channel: latest/edge
    revision: 212
    base: ubuntu@20.04/stable
    resources:
      prometheus-image: 150
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  tempo-k8s:
    charm: tempo-k8s
    channel: latest/edge
    revision: 83
    resources:
      tempo-image: 17
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    trust: true
  traefik:
    charm: traefik-k8s
    channel: latest/edge
    revision: 211
    base: ubuntu@20.04/stable
    resources:
      traefik-image: 161
    scale: 1
    constraints: arch=amd64
    storage:
      configurations: kubernetes,1,1024M
    trust: true
relations:
- - traefik:ingress-per-unit
  - prometheus:ingress
- - traefik:ingress-per-unit
  - loki:ingress
- - traefik:traefik-route
  - grafana:ingress
- - traefik:ingress
  - alertmanager:ingress
- - prometheus:alertmanager
  - alertmanager:alerting
- - grafana:grafana-source
  - prometheus:grafana-source
- - grafana:grafana-source
  - loki:grafana-source
- - grafana:grafana-source
  - alertmanager:grafana-source
- - loki:alertmanager
  - alertmanager:alerting
- - prometheus:metrics-endpoint
  - traefik:metrics-endpoint
- - prometheus:metrics-endpoint
  - alertmanager:self-metrics-endpoint
- - prometheus:metrics-endpoint
  - loki:metrics-endpoint
- - prometheus:metrics-endpoint
  - grafana:metrics-endpoint
- - grafana:grafana-dashboard
  - loki:grafana-dashboard
- - grafana:grafana-dashboard
  - prometheus:grafana-dashboard
- - grafana:grafana-dashboard
  - alertmanager:grafana-dashboard
- - catalogue:ingress
  - traefik:ingress
- - catalogue:catalogue
  - grafana:catalogue
- - catalogue:catalogue
  - prometheus:catalogue
- - catalogue:catalogue
  - alertmanager:catalogue
- - catalogue:catalogue
  - loki:catalogue
- - loki:logging
  - tempo-k8s:logging
- - loki:logging
  - traefik:logging
- - tempo-k8s:tracing
  - alertmanager:tracing
- - tempo-k8s:tracing
  - catalogue:tracing
- - tempo-k8s:grafana-dashboard
  - grafana:grafana-dashboard
- - tempo-k8s:grafana-source
  - grafana:grafana-source
- - tempo-k8s:tracing
  - grafana:tracing
- - tempo-k8s:tracing
  - loki:tracing
- - tempo-k8s:metrics-endpoint
  - prometheus:metrics-endpoint
- - tempo-k8s:tracing
  - prometheus:tracing
- - tempo-k8s:tracing
  - traefik:tracing
- - traefik:grafana-dashboard
  - grafana:grafana-dashboard
- - traefik:traefik-route
  - tempo-k8s:ingress
- - prometheus:receive-remote-write
  - remote-f33b3981087c439185dc9e6c2cbb47d8:send-remote-write
- - tempo-k8s:tracing
  - remote-f33b3981087c439185dc9e6c2cbb47d8:tracing
--- # overlay.yaml
applications:
  prometheus:
    offers:
      prometheus-receive-remote-write:
        endpoints:
        - receive-remote-write
        acl:
          admin: admin
  tempo-k8s:
    offers:
      tracing:
        endpoints:
        - tracing
        acl:
          admin: admin

Use juju config to change charm tracing configuration:

juju config grafana-agent charm_traces_sampling_percentage=0

Observe that no new charm traces appear, then:

juju config grafana-agent charm_traces_sampling_percentage=100

Observe that charm traces started appearing again.

Upgrade Notes