canonical / pebble

Take control of your internal daemons!
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
143 stars 54 forks source link

feat(logfwd): add log labels #312

Closed barrettj12 closed 10 months ago

barrettj12 commented 11 months ago

This PR builds on the previous log forwarding PRs (#209, #252, #256, #267) by allowing the user to specify labels for logs forwarded from Pebble.

Labels will be defined in a new field labels inside the log target:

log-targets:
  tgt1:
    type: loki
    location: https://my.loki.server.com:3100/loki/api/v1/push
    labels:
      key1: val1
      key2: val2

Labels will support environment variable substitutions ($ENV_VAR), and these will be interpreted "dynamically" in the context of the relevant service.

Notes on the design

Labels can contain $ENVIRONMENT_VARIABLES, which will be substituted in the environment of the relevant service. However, the "service environment" could mean:

  1. the environment map as defined in the plan
  2. the actual environment in which the process is run

We decided to use (1) instead, as it is more explicit, can be "statically" evaluated just from the plan, and doesn't require extra data to be passed through from the service manager and cached.

Furthermore, anytime we change the labels, we first flush any logs currently in the buffer, so that these logs are sent with the correct (old) labels.

QA steps

Set up a simple logging service: ```bash #!/bin/bash while true; do echo "hello" sleep 1 done ``` Set up a plan using labels and environment variables: ```yaml services: svc1: &svc1 command: /path/to/logsvc startup: enabled override: merge environment: OWNER: alice IP: 100.32.2.1 PORT: 9090 svc2: <<: *svc1 environment: OWNER: bob IP: 100.32.2.5 PORT: 3000 log-targets: tgt1: type: loki override: merge location: http://10.42.0.222:3100/loki/api/v1/push services: [all] labels: owner: user-$OWNER address: http://${IP}:${PORT} ``` Run Pebble: ``` PEBBLE_DEBUG=1 go run ./cmd/pebble run ``` and verify that the labels are being correctly interpreted.

Links

Jira card: JUJU-4643

barrettj12 commented 10 months ago

Commit ca0afa152e34537a9af1cf4be3067997033f56bb seems to have introduced a regression. Now, managerSuite.TestTimelyShutdown is failing, because the LogManager.Stop() call is timing out:

FAIL: manager_test.go:129: managerSuite.TestTimelyShutdown

manager_test.go:180:
    c.Fatal("LogManager.Stop() took too long")
... Error: LogManager.Stop() took too long

EDIT: it seems to work fine with one gatherer, but as soon as you add a second, it fails. Something must be getting deadlocked somewhere.