canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 16 forks source link

[DPE-5417] Add check to ensure peer databag populated before reconciling mysqld exporter pebble layers #505

Closed shayancanonical closed 2 months ago

shayancanonical commented 2 months ago

Issue

If a relation with the metrics endpoint is created before the charm is allocated, it is possible that the metrics-endpoint-created event runs before the leader-elected hook. The metrics-endpoint-created handler needs to use mysql which relies on values set in the leader-elected handler. This results in the charm going into an error state from which it can never recover (the charm continually re-runs the metrics-endpoint-created handler until it exits successfully)

Fixes: https://github.com/canonical/mysql-k8s-operator/issues/504

Solution

Add a check to ensure that the values in the app peer databag are set (ensuring leader-elected is run first)

Testing

Deployed COS + kubeflow locally, ensured the issue was reproducible. Added a hotfix in this PR, and ensured that the unit comes up successfully

unit-katib-db-0: 13:04:07 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook                                                                                                     
unit-katib-db-0: 13:04:36 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook                                                                                                     
unit-katib-db-0: 13:04:39 ERROR unit.katib-db/0.juju-log metrics-endpoint:66: Uncaught exception while in charm code:                                                                                       
Traceback (most recent call last):                                                                                                                                                                          
  File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 888, in <module>                                                                                                                   
    main(MySQLOperatorCharm)                                                                                                                                                                                
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 551, in main                                                                                                                     
    manager.run()                                                                                                                                                                                           
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 530, in run                                                                                                                      
    self._emit()                                                                                                                                                                                            
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 519, in _emit                                                                                                                    
    _emit_charm_event(self.charm, self.dispatcher.event_name)                                                                                                                                               
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 147, in _emit_charm_event                                                                                                        
    event_to_emit.emit(*args, **kwargs)                                                                                                                                                                     
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 348, in emit                                                                                                                
    framework._emit(event)                                                                                                                                                                                  
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 860, in _emit                                                                                                               
    self._reemit(event_path)                                                                                                                                                                                
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 950, in _reemit                                                                                                             
    custom_handler(event)                                                                                                                                                                                   
  File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function                                                                                 
    return callable(*args, **kwargs)  # type: ignore                                                                                                                                                        
  File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 472, in _reconcile_mysqld_exporter                                                                                                 
    if not self._mysql.is_data_dir_initialised():                                                                                                                                                           
  File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 200, in _mysql                                                                                                                     
    self.app_peer_data["cluster-name"],                                                                                                                                                                     
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/model.py", line 1812, in __getitem__                                                                                                            
    return super().__getitem__(key)                                                                                                                                                                         
  File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/model.py", line 841, in __getitem__                                                                                                             
    return self._data[key]
KeyError: 'cluster-name'
unit-katib-db-0: 13:04:39 ERROR juju.worker.uniter.operation hook "metrics-endpoint-relation-created" (via hook dispatching script: dispatch) failed: exit status 1
unit-katib-db-0: 13:04:39 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook
unit-katib-db-0: 13:05:23 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook

--- hotfix applied to unit charm code ---

unit-katib-db-0: 13:06:25 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook
unit-katib-db-0: 13:07:16 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook
unit-katib-db-0: 13:07:16 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook
unit-katib-db-0: 13:08:29 INFO juju.worker.uniter awaiting error resolution for "relation-created" hook
unit-katib-db-0: 13:08:30 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:32 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-created" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:33 INFO juju.worker.uniter.operation ran "restart-relation-created" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:34 INFO juju.worker.uniter.operation ran "upgrade-relation-created" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:35 INFO juju.worker.uniter.operation ran "database-peers-relation-created" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:35 INFO juju.worker.uniter found queued "leader-elected" hook
unit-katib-db-0: 13:08:37 WARNING unit.katib-db/0.juju-log Failed to check if cluster metadata exists from_instance='katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local'
unit-katib-db-0: 13:08:37 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:38 INFO unit.katib-db/0.juju-log Setting up the logrotate configurations
unit-katib-db-0: 13:08:44 INFO unit.katib-db/0.juju-log Configuring instance
unit-katib-db-0: 13:08:44 INFO unit.katib-db/0.juju-log Installing plugin='audit_log'
unit-katib-db-0: 13:08:44 INFO unit.katib-db/0.juju-log Installing plugin='audit_log_filter'
unit-katib-db-0: 13:08:53 WARNING unit.katib-db/0.juju-log Failed to check if cluster metadata exists from_instance='katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local'
unit-katib-db-0: 13:08:53 INFO unit.katib-db/0.juju-log Creating cluster cluster-ae1e00e9160aba3af75a49c0f03deffb
unit-katib-db-0: 13:08:58 INFO juju.worker.uniter.operation ran "mysql-pebble-ready" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:08:59 INFO unit.katib-db/0.juju-log Starting the log rotate manager
unit-katib-db-0: 13:08:59 INFO unit.katib-db/0.juju-log Started log rotate manager process with PID 1949
unit-katib-db-0: 13:08:59 INFO juju.worker.uniter.operation ran "database-storage-attached" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:09:01 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:09:04 INFO juju.worker.uniter found queued "start" hook
unit-katib-db-0: 13:09:05 INFO unit.katib-db/0.juju-log Running legacy hooks/start.
unit-katib-db-0: 13:09:07 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)
unit-katib-db-0: 13:09:10 INFO unit.katib-db/0.juju-log database:28: Kubernetes service katib-db-primary created
unit-katib-db-0: 13:09:10 INFO unit.katib-db/0.juju-log database:28: Kubernetes service katib-db-replicas created
shayancanonical commented 2 months ago

Added integration test in 58f42cd38c6d6f19d2de0eb177ddc11622ebad82

taurus-forever commented 2 months ago

Thank you the test, LGTM. Please create a backlog to add COS full test (installing cos-lite bundle in parallel model and make sure grafana-agent is active and metrics/logs sent there.

shayancanonical commented 2 months ago

Created https://warthogs.atlassian.net/browse/DPE-5482