canonical / grafana-agent-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 8 forks source link

`juju-unit` labels not being applied to charm logs when related to grafana-agent #76

Closed shayancanonical closed 3 months ago

shayancanonical commented 3 months ago

Bug Description

When relating mysql-router vm operator to COS using grafana-agent's cos-agent interface, we are unable to see the juju_unit label being applied to the mysql-router logs. See below image for the lack of such a label image

See below image for the presence of such a label for the grafana-agent/0 unit image

The MySQLRouter VM charm uses COSAgentProvider. The charmed-mysql-snap has 2 directories defined the logs slot

To Reproduce

  1. Deploy cos-lite in a separate k8s model (named cos in this case, with controller named uk8s)
  2. Create an lxd model (juju add-model database)
  3. juju deploy -n 1 mysql --channel=8.0/stable
  4. juju deploy -n 1 mysql-test-app --channel=latest/stable
  5. juju deploy -n 1 mysql-router --channel=dpe/edge
  6. juju relate mysql-router mysql-test-app
  7. juju relate mysql-router mysql
  8. juju deploy -n 1 grafana-agent --channel=latest/edge
  9. juju relate grafana-agent mysql-test-app
  10. juju find-offers uk8s:
  11. juju consume uk8s:admin/cos.grafana
  12. juju consume uk8s:admin/cos.loki
  13. juju consume uk8s:admin/cos.prometheus
  14. juju relate grafana-agent mysql-router:cos-agent
  15. juju relate grafana-agent grafana
  16. juju relate grafana-agent loki
  17. juju relate grafana-agent prometheus
  18. Visit grafana at http:///cos-grafana/explore and look for logs with label juju_unit=mysql-router/0 (which is not available). However logs with label juju_unit=grafana-agent/0 will be available. To see logs for mysql-router/0, you will have to apply the following label filters: path=/mysqlrouter

Environment

COS-lite deployed on juju with uk8s Database deployed on juju with lxd

$ juju version
3.4.0-genericlinux-amd64
$ microk8s version 
MicroK8s v1.27.11 revision 6530
$ lxd version
5.20
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
...

Relevant log output

juju debug-log:

machine-1: 10:01:18 INFO juju.worker.deployer deploying unit "grafana-agent/0"                                                                                                                                     
machine-1: 10:01:18 INFO juju.worker.deployer creating new agent config for "grafana-agent/0"                                                                                                                      
machine-1: 10:01:18 INFO juju.worker.deployer starting workers for "grafana-agent/0"                                                                                                                               
machine-1: 10:01:18 INFO juju.worker.deployer start "grafana-agent/0"                                                                                                                                              
unit-grafana-agent-0: 10:01:18 INFO juju Starting unit workers for "grafana-agent/0"                                                                                                                               
machine-1: 10:01:18 INFO juju.api connection established to "wss://10.205.193.27:17070/model/c0d4c5d4-b2ed-4347-8292-f7d6405ca5f0/api"                                                                             
unit-grafana-agent-0: 10:01:18 INFO juju.worker.apicaller [c0d4c5] "unit-grafana-agent-0" successfully connected to "10.205.193.27:17070"                                                                          
unit-grafana-agent-0: 10:01:18 INFO juju.worker.apicaller [c0d4c5] password changed for "unit-grafana-agent-0"                                                                                                     
machine-1: 10:01:18 INFO juju.api connection established to "wss://10.205.193.27:17070/model/c0d4c5d4-b2ed-4347-8292-f7d6405ca5f0/api"                                                                             
unit-grafana-agent-0: 10:01:18 INFO juju.worker.apicaller [c0d4c5] "unit-grafana-agent-0" successfully connected to "10.205.193.27:17070"                                                                          
unit-grafana-agent-0: 10:01:18 INFO juju.worker.upgrader no waiter, upgrader is done
unit-grafana-agent-0: 10:01:18 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-grafana-agent-0: 10:01:18 INFO juju.worker.logger logger worker started
machine-1: 10:01:18 INFO juju.worker.leadership grafana-agent/0 promoted to leadership of grafana-agent
unit-grafana-agent-0: 10:01:18 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
machine-1: 10:01:18 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-grafana-agent-0
machine-1: 10:01:18 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/3.4.0-ubuntu-amd64
unit-grafana-agent-0: 10:01:18 INFO juju.worker.uniter unit "grafana-agent/0" started
unit-grafana-agent-0: 10:01:18 INFO juju.worker.uniter resuming charm install
unit-grafana-agent-0: 10:01:18 INFO juju.worker.uniter.charm downloading ch:amd64/jammy/grafana-agent-70 from API server
machine-1: 10:01:18 INFO juju.downloader downloading from ch:amd64/jammy/grafana-agent-70
machine-1: 10:01:18 INFO juju.downloader download complete ("ch:amd64/jammy/grafana-agent-70")
machine-1: 10:01:18 INFO juju.downloader download verified ("ch:amd64/jammy/grafana-agent-70")
unit-grafana-agent-0: 10:01:21 INFO juju.worker.uniter hooks are retried true
unit-grafana-agent-0: 10:01:21 INFO juju.worker.uniter.storage initial storage attachments ready
unit-grafana-agent-0: 10:01:21 INFO juju.worker.uniter found queued "install" hook
unit-grafana-agent-0: 10:01:22 INFO unit.grafana-agent/0.juju-log Running legacy hooks/install.
controller-0: 10:01:37 INFO juju.worker.remoterelations cmr start "grafana"
controller-0: 10:01:39 ERROR juju.worker.remoterelations cmr error in remote application worker for grafana: cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:39 INFO juju.worker.remoterelations cmr stopped "grafana", err: cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:39 INFO juju.worker.remoterelations cmr non-fatal error "grafana": cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:39 ERROR juju.worker.remoterelations cmr exited "grafana": cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:39 INFO juju.worker.remoterelations cmr restarting "grafana" in 15s
controller-0: 10:01:40 INFO juju.worker.remoterelations cmr start "loki"
controller-0: 10:01:44 INFO juju.worker.remoterelations cmr start "prometheus"
controller-0: 10:01:46 ERROR juju.worker.remoterelations cmr error in remote application worker for prometheus: cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:46 INFO juju.worker.remoterelations cmr stopped "prometheus", err: cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:46 INFO juju.worker.remoterelations cmr non-fatal error "prometheus": cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:46 ERROR juju.worker.remoterelations cmr exited "prometheus": cannot connect to external controller: opening facade to remote model: cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:01:46 INFO juju.worker.remoterelations cmr restarting "prometheus" in 15s
unit-grafana-agent-0: 10:01:50 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:51 INFO juju.worker.uniter.operation ran "juju-info-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:52 INFO unit.grafana-agent/0.juju-log peers:8: certhandler waiting on certificates relation
unit-grafana-agent-0: 10:01:52 INFO juju.worker.uniter.operation ran "peers-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:52 INFO juju.worker.uniter found queued "leader-elected" hook
unit-grafana-agent-0: 10:01:53 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
controller-0: 10:01:54 INFO juju.worker.remoterelations cmr start "grafana"
unit-grafana-agent-0: 10:01:55 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:55 INFO juju.worker.uniter found queued "start" hook
unit-grafana-agent-0: 10:01:55 INFO unit.grafana-agent/0.juju-log Running legacy hooks/start.
unit-grafana-agent-0: 10:01:56 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:57 INFO juju.worker.uniter.operation ran "juju-info-relation-joined" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:01:57 INFO juju.worker.uniter.operation ran "juju-info-relation-changed" hook (via hook dispatching script: dispatch)
controller-0: 10:02:01 INFO juju.worker.remoterelations cmr start "prometheus"
controller-0: 10:02:03 ERROR juju.worker.remoterelations cmr error in remote application worker for prometheus: cannot connect to external controller: opening facade to remote model: try was stopped
controller-0: 10:02:03 INFO juju.worker.remoterelations cmr stopped "prometheus", err: cannot connect to external controller: opening facade to remote model: try was stopped
controller-0: 10:02:03 INFO juju.worker.remoterelations cmr non-fatal error "prometheus": cannot connect to external controller: opening facade to remote model: try was stopped
controller-0: 10:02:03 ERROR juju.worker.remoterelations cmr exited "prometheus": cannot connect to external controller: opening facade to remote model: try was stopped
controller-0: 10:02:03 INFO juju.worker.remoterelations cmr restarting "prometheus" in 15s
controller-0: 10:02:18 INFO juju.worker.remoterelations cmr start "prometheus"
unit-mysql-test-app-0: 10:02:45 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mysql-router-0: 10:03:24 INFO juju.worker.uniter.operation ran "cos-agent-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:24 INFO juju.worker.uniter.operation ran "cos-agent-relation-created" hook (via hook dispatching script: dispatch)
unit-mysql-router-0: 10:03:25 INFO juju.worker.uniter.operation ran "cos-agent-relation-joined" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:28 INFO juju.worker.uniter.operation ran "cos-agent-relation-joined" hook (via hook dispatching script: dispatch)
unit-mysql-router-0: 10:03:28 INFO juju.worker.uniter.operation ran "cos-agent-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:30 INFO juju.worker.uniter.operation ran "cos-agent-relation-changed" hook (via hook dispatching script: dispatch)
controller-0: 10:03:39 INFO juju.worker.firewaller start "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider"
unit-grafana-agent-0: 10:03:40 INFO juju.worker.uniter.operation ran "grafana-dashboards-provider-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:40 INFO juju.worker.uniter.operation ran "grafana-dashboards-provider-relation-joined" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:41 INFO juju.worker.uniter.operation ran "grafana-dashboards-provider-relation-changed" hook (via hook dispatching script: dispatch)
controller-0: 10:03:42 INFO juju.worker.firewaller start "grafana-agent:logging-consumer loki:logging"
controller-0: 10:03:44 INFO juju.worker.firewaller stopped "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider", err: cannot open facade to remote model to watch ingress addresses: cannot resolv
e "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:44 INFO juju.worker.firewaller non-fatal error "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider": cannot open facade to remote model to watch ingress addresses: cannot res
olve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:44 ERROR juju.worker.firewaller exited "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider": cannot open facade to remote model to watch ingress addresses: cannot resolve "co
ntroller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:44 INFO juju.worker.firewaller restarting "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider" in 1m0s
unit-mysql-router-0: 10:03:44 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
controller-0: 10:03:44 INFO juju.worker.firewaller start "grafana-agent:send-remote-write prometheus:receive-remote-write"
unit-grafana-agent-0: 10:03:45 INFO juju.worker.uniter.operation ran "logging-consumer-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:45 INFO juju.worker.uniter.operation ran "send-remote-write-relation-created" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:46 INFO juju.worker.uniter.operation ran "logging-consumer-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:47 INFO juju.worker.uniter.operation ran "logging-consumer-relation-joined" hook (via hook dispatching script: dispatch)
controller-0: 10:03:49 INFO juju.worker.firewaller stopped "grafana-agent:send-remote-write prometheus:receive-remote-write", err: cannot open facade to remote model to publish network change: cannot resolve "co
ntroller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:49 INFO juju.worker.firewaller non-fatal error "grafana-agent:send-remote-write prometheus:receive-remote-write": cannot open facade to remote model to publish network change: cannot resolve 
"controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:49 ERROR juju.worker.firewaller exited "grafana-agent:send-remote-write prometheus:receive-remote-write": cannot open facade to remote model to publish network change: cannot resolve "control
ler-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:03:49 INFO juju.worker.firewaller restarting "grafana-agent:send-remote-write prometheus:receive-remote-write" in 1m0s
unit-grafana-agent-0: 10:03:50 INFO juju.worker.uniter.operation ran "send-remote-write-relation-joined" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:51 INFO juju.worker.uniter.operation ran "logging-consumer-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:53 INFO juju.worker.uniter.operation ran "send-remote-write-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:03:55 INFO juju.worker.uniter.operation ran "send-remote-write-relation-changed" hook (via hook dispatching script: dispatch)
unit-mysql-0: 10:04:02 INFO unit.mysql/0.juju-log Unit workload member-state is online with member-role primary
unit-mysql-0: 10:04:03 WARNING unit.mysql/0.juju-log No relation: certificates
unit-mysql-0: 10:04:04 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
controller-0: 10:04:44 INFO juju.worker.firewaller start "grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider"
controller-0: 10:04:49 INFO juju.worker.firewaller start "grafana-agent:send-remote-write prometheus:receive-remote-write"
controller-0: 10:04:51 INFO juju.worker.firewaller stopped "grafana-agent:send-remote-write prometheus:receive-remote-write", err: cannot open facade to remote model to publish network change: cannot resolve "co
ntroller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:04:51 INFO juju.worker.firewaller non-fatal error "grafana-agent:send-remote-write prometheus:receive-remote-write": cannot open facade to remote model to publish network change: cannot resolve 
"controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:04:51 ERROR juju.worker.firewaller exited "grafana-agent:send-remote-write prometheus:receive-remote-write": cannot open facade to remote model to publish network change: cannot resolve "control
ler-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: i/o timeout
controller-0: 10:04:51 INFO juju.worker.firewaller restarting "grafana-agent:send-remote-write prometheus:receive-remote-write" in 1m0s
controller-0: 10:05:51 INFO juju.worker.firewaller start "grafana-agent:send-remote-write prometheus:receive-remote-write"
unit-grafana-agent-0: 10:07:16 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mysql-router-0: 10:07:52 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mysql-test-app-0: 10:07:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mysql-0: 10:08:18 INFO unit.mysql/0.juju-log Unit workload member-state is online with member-role primary
unit-mysql-0: 10:08:19 WARNING unit.mysql/0.juju-log No relation: certificates
unit-mysql-0: 10:08:19 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-grafana-agent-0: 10:11:26 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-mysql-0: 10:12:41 INFO unit.mysql/0.juju-log Unit workload member-state is online with member-role primary
unit-mysql-0: 10:12:42 WARNING unit.mysql/0.juju-log No relation: certificates
unit-mysql-0: 10:12:42 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

Additional context

Following entries are present under logs.configs in /etc/grafana-agent.yaml:

    - job_name: charmed-mysql-common-var-log-mysql
      pipeline_stages:
      - drop:
          expression: .*file is a directory.*
      relabel_configs:
      - replacement: /mysql
        source_labels:
        - __path__
        target_label: path
      static_configs:
      - labels:
          __path__: /snap/grafana-agent/16/shared-logs/mysql/**
          job: charmed-mysql-common-var-log-mysql
          juju_model: database
          juju_model_uuid: c0d4c5d4-b2ed-4347-8292-f7d6405ca5f0
        targets:
        - localhost
    - job_name: charmed-mysql-common-var-log-mysqlrouter
      pipeline_stages:
      - drop:
          expression: .*file is a directory.*
      relabel_configs:
      - replacement: /mysqlrouter
        source_labels:
        - __path__
        target_label: path
      static_configs:
      - labels:
          __path__: /snap/grafana-agent/16/shared-logs/mysqlrouter/**
          job: charmed-mysql-common-var-log-mysqlrouter
          juju_model: database
          juju_model_uuid: c0d4c5d4-b2ed-4347-8292-f7d6405ca5f0
        targets:
        - localhost

Following entries are available in /var/lib/snapd/mount/snap.grafana-agent.fstab

/var/snap/charmed-mysql/common/var/log/mysql /snap/grafana-agent/16/shared-logs/mysql none bind,ro 0 0
/var/snap/charmed-mysql/common/var/log/mysqlrouter /snap/grafana-agent/16/shared-logs/mysqlrouter none bind,ro 0 0

Interestingly, I do not see the shared-logs directory defined in fstab:

$ sudo ls -la /snap/grafana-agent/16/
total 493623
-rwxr-xr-x 1 root root 261670928 Aug 15  2023 agent
-rwxr-xr-x 1 root root       465 Aug 15  2023 agent-wrapper
-rwxr-xr-x 1 root root 243798538 Aug 15  2023 agentctl
drwxr-xr-x 2 root root         0 Aug 15  2023 etc
drwxr-xr-x 4 root root         0 Aug 15  2023 meta
drwxr-xr-x 3 root root         0 Aug 15  2023 snap
drwxr-xr-x 6 root root         0 Aug 15  2023 usr
drwxr-xr-x 3 root root         0 Aug 15  2023 var

Confirmed that the snap connection exists:

$ sudo snap connections
Interface         Plug                                  Slot                Notes
content[logs]     grafana-agent:logs                    charmed-mysql:logs  manual
sed-i commented 3 months ago

Note for future reference:

grafana-agent-snap:

plugs:
  logs:
    interface: content
    target: $SNAP/shared-logs

charmed-mysql-snap:

slots:
  logs:
    interface: content
    source:
      read:
        - $SNAP_COMMON/var/log/mysql
        - $SNAP_COMMON/var/log/mysqlrouter

charmed-zookeeper-snap:

slots:
  logs:
    interface: content
    source:
      read: 
        - $SNAP_COMMON/var/log/zookeeper
lucabello commented 3 months ago

Just noting that it looks like we explicitly don't take juju_unit and juju_application from the topology.

Edit: that's because that topology information comes from Grafana agent; we need to check if and how we can get the principal's topology here.