canonical / prometheus-scrape-config-k8s-operator

This charmed operator allows operators to fine-tune scrape job configurations before sending them to the Prometheus charmed operator.
https://charmhub.io/prometheus-scrape-config-k8s
Apache License 2.0
1 stars 1 forks source link

Status is stuck on "installing agent" after `stop` -> `start` #27

Closed sed-i closed 1 year ago

sed-i commented 1 year ago

Bug Description

During a load test, all charms received a stop, but only scarpe-config got stuck on installing agent. The charm was running fine and debug-log clearly showed that update-status take place. It's just the status that wasn't updating.

App            Version  Status   Scale  Charm                         Channel  Rev  Address         Exposed  Message
scrape-config  n/a      waiting      1  prometheus-scrape-config-k8s  edge      39  10.152.183.17   no       installing agent

Unit              Workload     Agent  Address      Ports  Message
scrape-config/0*  maintenance  idle   10.1.79.214         

To Reproduce

  1. Deploy the cos-lite load test.
  2. Wait.

Environment

Relevant log output

# juju show-status-log scrape-config/0
24 Feb 2023 17:28:44-05:00  juju-unit  idle         
24 Feb 2023 17:28:44-05:00  workload   active       
25 Feb 2023 18:01:03-05:00  workload   maintenance  stopping charm software
25 Feb 2023 18:01:03-05:00  juju-unit  executing    running stop hook
25 Feb 2023 18:01:06-05:00  juju-unit  idle         
25 Feb 2023 18:01:10-05:00  juju-unit  executing    running start hook
25 Feb 2023 18:02:00-05:00  juju-unit  idle         
25 Feb 2023 18:02:00-05:00  workload   maintenance  

# two days later, after manually running juju config, the status became correct again
27 Feb 2023 13:42:40-05:00  juju-unit  executing    running config-changed hook
27 Feb 2023 13:42:40-05:00  workload   maintenance  Updating scrape jobs and alert rules for all metrics consumer
27 Feb 2023 13:42:40-05:00  workload   active       
27 Feb 2023 13:42:41-05:00  juju-unit  idle

Additional context

For some reason, the stop wasn't followed by upgrade-charm - only start. Twice.

unit-scrape-config-0: 2023-02-25 18:01:03 DEBUG unit.scrape-config/0.juju-log Emitting Juju event update_status.
unit-scrape-config-0: 2023-02-25 18:01:04 DEBUG unit.scrape-config/0.juju-log Operator Framework 2.0.0 up and running.
unit-scrape-config-0: 2023-02-25 18:01:04 DEBUG unit.scrape-config/0.juju-log Emitting Juju event stop.
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.cmd running containerAgent [2.9.34 90e2f047763059f0b8a57941ae0907346464aee8 gc go1.19]
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.cmd.containeragent.unit start "unit"
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.worker.upgradesteps upgrade steps for 2.9.34 have already been run.
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.worker.probehttpserver starting http server on [::]:3856
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.api cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: operation was canceled
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.api connection established to "wss://10.152.183.72:17070/model/02f10d90-7242-4543-89d1-d2bf094a30bf/api"
unit-scrape-config-0: 2023-02-25 18:01:08 INFO juju.worker.apicaller [02f10d] "unit-scrape-config-0" successfully connected to "10.152.183.72:17070"
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.api cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: operation was canceled
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.api connection established to "wss://10.152.183.72:17070/model/02f10d90-7242-4543-89d1-d2bf094a30bf/api"
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.worker.apicaller [02f10d] "unit-scrape-config-0" successfully connected to "10.152.183.72:17070"
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.worker.migrationminion migration phase is now: NONE
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.worker.logger logger worker started
unit-scrape-config-0: 2023-02-25 18:01:09 INFO juju.worker.leadership scrape-config/0 promoted to leadership of scrape-config
unit-scrape-config-0: 2023-02-25 18:01:09 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-scrape-config-0: 2023-02-25 18:01:10 DEBUG unit.scrape-config/0.juju-log Operator Framework 2.0.0 up and running.
unit-scrape-config-0: 2023-02-25 18:01:10 INFO unit.scrape-config/0.juju-log Running legacy hooks/start.
unit-scrape-config-0: 2023-02-25 18:01:11 DEBUG unit.scrape-config/0.juju-log Operator Framework 2.0.0 up and running.
unit-scrape-config-0: 2023-02-25 18:01:11 DEBUG unit.scrape-config/0.juju-log Charm called itself via hooks/start.
unit-scrape-config-0: 2023-02-25 18:01:12 DEBUG unit.scrape-config/0.juju-log Legacy hooks/start exited with status 0.
unit-scrape-config-0: 2023-02-25 18:01:12 DEBUG unit.scrape-config/0.juju-log Emitting Juju event start.
unit-scrape-config-0: 2023-02-25 18:06:41 DEBUG unit.scrape-config/0.juju-log Operator Framework 2.0.0 up and running.
unit-scrape-config-0: 2023-02-25 18:06:41 DEBUG unit.scrape-config/0.juju-log Emitting Juju event update_status.
simskij commented 1 year ago

Validate this to make sure it was squashed in latest 3.1 / 2.9

sed-i commented 1 year ago

Seems like it happens only to this charm. Anyway, there is a juju ticket for it already. Closing.