Closed mmkay closed 1 month ago
Grafana-agent charm passes all traces to the tracing backend as it doesn't have any downsampling configuration.
Add a sampling policy with three config variables, setting up sampling strategies for charm traces, workload traces and errors.
Grafana agent uses tail sampling processor from opentelemetry-collector, its reference is here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.96.0/processor/tailsamplingprocessor
Tandem PR: https://github.com/canonical/grafana-agent-k8s-operator/pull/319
Run the following bundle in a machine model, using mysql-operator from this branch:
default-base: ubuntu@22.04/stable saas: prometheus-receive-remote-write: url: microk8s:admin/welcome-k8s.prometheus-receive-remote-write tracing: url: microk8s:admin/welcome-k8s.tracing applications: grafana-agent: charm: local:grafana-agent-0 options: charm_traces_sampling_percentage: 100 mysql: charm: local:mysql-0 num_units: 1 to: - "0" constraints: arch=amd64 storage: database: rootfs,1,1024M machines: "0": constraints: arch=amd64 relations: - - grafana-agent:send-remote-write - prometheus-receive-remote-write:receive-remote-write - - grafana-agent:tracing - tracing:tracing - - mysql:cos-agent - grafana-agent:cos-agent
In another k8s model, deploy cos-lite + tempo:
bundle: kubernetes saas: remote-f33b3981087c439185dc9e6c2cbb47d8: {} applications: alertmanager: charm: alertmanager-k8s channel: latest/edge revision: 135 base: ubuntu@20.04/stable resources: alertmanager-image: 98 scale: 1 constraints: arch=amd64 storage: data: kubernetes,1,1024M trust: true catalogue: charm: catalogue-k8s channel: latest/edge revision: 63 base: ubuntu@20.04/stable resources: catalogue-image: 34 scale: 1 options: description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight, highly-integrated, \nJuju-based observability suite running on Kubernetes.\n" tagline: Model-driven Observability Stack deployed with a single command. title: Canonical Observability Stack constraints: arch=amd64 trust: true grafana: charm: grafana-k8s channel: latest/edge revision: 119 base: ubuntu@20.04/stable resources: grafana-image: 70 litestream-image: 45 scale: 1 constraints: arch=amd64 storage: database: kubernetes,1,1024M trust: true loki: charm: loki-k8s channel: latest/edge revision: 171 base: ubuntu@20.04/stable resources: loki-image: 100 node-exporter-image: 3 scale: 1 constraints: arch=amd64 storage: active-index-directory: kubernetes,1,1024M loki-chunks: kubernetes,1,1024M trust: true prometheus: charm: prometheus-k8s channel: latest/edge revision: 212 base: ubuntu@20.04/stable resources: prometheus-image: 150 scale: 1 constraints: arch=amd64 storage: database: kubernetes,1,1024M trust: true tempo-k8s: charm: tempo-k8s channel: latest/edge revision: 83 resources: tempo-image: 17 scale: 1 constraints: arch=amd64 storage: data: kubernetes,1,1024M trust: true traefik: charm: traefik-k8s channel: latest/edge revision: 211 base: ubuntu@20.04/stable resources: traefik-image: 161 scale: 1 constraints: arch=amd64 storage: configurations: kubernetes,1,1024M trust: true relations: - - traefik:ingress-per-unit - prometheus:ingress - - traefik:ingress-per-unit - loki:ingress - - traefik:traefik-route - grafana:ingress - - traefik:ingress - alertmanager:ingress - - prometheus:alertmanager - alertmanager:alerting - - grafana:grafana-source - prometheus:grafana-source - - grafana:grafana-source - loki:grafana-source - - grafana:grafana-source - alertmanager:grafana-source - - loki:alertmanager - alertmanager:alerting - - prometheus:metrics-endpoint - traefik:metrics-endpoint - - prometheus:metrics-endpoint - alertmanager:self-metrics-endpoint - - prometheus:metrics-endpoint - loki:metrics-endpoint - - prometheus:metrics-endpoint - grafana:metrics-endpoint - - grafana:grafana-dashboard - loki:grafana-dashboard - - grafana:grafana-dashboard - prometheus:grafana-dashboard - - grafana:grafana-dashboard - alertmanager:grafana-dashboard - - catalogue:ingress - traefik:ingress - - catalogue:catalogue - grafana:catalogue - - catalogue:catalogue - prometheus:catalogue - - catalogue:catalogue - alertmanager:catalogue - - catalogue:catalogue - loki:catalogue - - loki:logging - tempo-k8s:logging - - loki:logging - traefik:logging - - tempo-k8s:tracing - alertmanager:tracing - - tempo-k8s:tracing - catalogue:tracing - - tempo-k8s:grafana-dashboard - grafana:grafana-dashboard - - tempo-k8s:grafana-source - grafana:grafana-source - - tempo-k8s:tracing - grafana:tracing - - tempo-k8s:tracing - loki:tracing - - tempo-k8s:metrics-endpoint - prometheus:metrics-endpoint - - tempo-k8s:tracing - prometheus:tracing - - tempo-k8s:tracing - traefik:tracing - - traefik:grafana-dashboard - grafana:grafana-dashboard - - traefik:traefik-route - tempo-k8s:ingress - - prometheus:receive-remote-write - remote-f33b3981087c439185dc9e6c2cbb47d8:send-remote-write - - tempo-k8s:tracing - remote-f33b3981087c439185dc9e6c2cbb47d8:tracing --- # overlay.yaml applications: prometheus: offers: prometheus-receive-remote-write: endpoints: - receive-remote-write acl: admin: admin tempo-k8s: offers: tracing: endpoints: - tracing acl: admin: admin
Use juju config to change charm tracing configuration:
juju config grafana-agent charm_traces_sampling_percentage=0
Observe that no new charm traces appear, then:
juju config grafana-agent charm_traces_sampling_percentage=100
Observe that charm traces started appearing again.
Issue
Grafana-agent charm passes all traces to the tracing backend as it doesn't have any downsampling configuration.
Solution
Add a sampling policy with three config variables, setting up sampling strategies for charm traces, workload traces and errors.
Context
Grafana agent uses tail sampling processor from opentelemetry-collector, its reference is here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.96.0/processor/tailsamplingprocessor
Tandem PR: https://github.com/canonical/grafana-agent-k8s-operator/pull/319
Testing Instructions
Run the following bundle in a machine model, using mysql-operator from this branch:
In another k8s model, deploy cos-lite + tempo:
Use juju config to change charm tracing configuration:
Observe that no new charm traces appear, then:
Observe that charm traces started appearing again.
Upgrade Notes