canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.4k stars 767 forks source link

microk8s in error state due to hook failed: "peer-relation-joined" #4372

Closed kaskavel closed 8 months ago

kaskavel commented 8 months ago

Summary

SQA has a failed run where microk8s v1.28.3 was stuck in an error state with the message:

hook failed: "peer-relation-joined"

What Should Happen Instead?

We would expect microk8s to deploy successfully.

Reproduction Steps

  1. Deploy MAAS v.3.3.5
  2. Bootstrap juju controller v. 3.1.7
  3. Use aforementioned juju controller to deploy microk8s according to the specified bundle.

Introspection Report

From the logs we are gathering we see:

2024-01-15 03:56:16 DEBUG unit.microk8s/0.juju-log server.go:325 peer:0: Execute: microk8s add-node --token aa7ee2a5736180173355314f804dd774 --token-ttl 7200 (args=(['microk8s', 'add-node', '--token', 'aa7ee2a5736180173355314f804dd774', '--token-ttl', '7200'],), kwargs={'check': True}) 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 From the node you wish to join to this cluster, run the following: 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 microk8s join 10.246.164.204:25000/aa7ee2a5736180173355314f804dd774/284b0a0e9910 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 Use the '--worker' flag to join a node as a worker not running the control plane, eg: 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 microk8s join 10.246.164.204:25000/aa7ee2a5736180173355314f804dd774/284b0a0e9910 --worker 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 If the node you are adding is not reachable through the default interface you can use one of the following: 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 microk8s join 10.246.164.204:25000/aa7ee2a5736180173355314f804dd774/284b0a0e9910 2024-01-15 03:56:16 DEBUG unit.microk8s/0.peer-relation-joined logger.go:60 microk8s join 10.246.168.58:25000/aa7ee2a5736180173355314f804dd774/284b0a0e9910 2024-01-15 03:56:17 ERROR unit.microk8s/0.juju-log server.go:325 peer:0: Uncaught exception while in charm code: Traceback (most recent call last): File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/model.py", line 2693, in _run result = subprocess.run(args, **kwargs) # type: ignore File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-microk8s-0/network-get', 'peer', '-r', '0', '--format=json')' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/var/lib/juju/agents/unit-microk8s-0/charm/./src/charm.py", line 428, in main(MicroK8sCharm, use_juju_for_storage=True) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/main.py", line 454, in call return main(charm_class, use_juju_for_storage=use_juju_for_storage) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/main.py", line 441, in main _emit_charm_event(charm, dispatcher.event_name) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/main.py", line 149, in _emit_charm_event event_to_emit.emit(*args, *kwargs) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/framework.py", line 344, in emit framework._emit(event) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/framework.py", line 833, in _emit self._reemit(event_path) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/framework.py", line 922, in _reemit custom_handler(event) File "/var/lib/juju/agents/unit-microk8s-0/charm/./src/charm.py", line 385, in add_node self.model.get_binding(event.relation).network.ingress_address, token File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/model.py", line 817, in network self._network = self._network_get(self.name, self._relation_id) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/model.py", line 810, in _network_get return Network(self._backend.network_get(name, relation_id)) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/model.py", line 2978, in network_get network = self._run(cmd, return_output=True, use_json=True) File "/var/lib/juju/agents/unit-microk8s-0/charm/venv/ops/model.py", line 2695, in _run raise ModelError(e.stderr) ops.model.ModelError: ERROR no network config found for binding "peer"

2024-01-15 03:56:17 ERROR juju.worker.uniter.operation runhook.go:180 hook "peer-relation-joined" (via hook dispatching script: dispatch) failed: exit status 1

Full logs available on the link provided above.

Can you suggest a fix?

Are you interested in contributing with a fix?

neoaggelos commented 8 months ago

Hi @kaskavel, looks like the ops library is failing to retrieve the unit address. This does not look like a charm bug, could it be on the Juju side of things? How often does this issue occur in your runs?

If the environment is still around, can you try to run the command manually on the existing unit?

juju run microk8s/0 -- network-get peer -r 0 --format=json
kaskavel commented 8 months ago

Hi @neoaggelos, thank you for your immediate response.

Unfortunately the environment is no longer available. A further look into the logs though suggests that it might be on the Juju side. It was the first time we 've noticed it, we could follow up with Juju team on this and in case we see it again we will execute the suggested command so as to get more info. For the time being, we can mark the issue as closed for microk8s.