Open beliaev-maksim opened 5 months ago
Sounds like the workers need to remember the first cluster they ever joined (maybe with the cluster-name?) such that when the cluster dies (b/c someone nukes the control-plane) -- they should go into a permanently blocked state Blocked: Awaiting juju destruction of unit
even if a new CP unit shows up, too bad -- we're not going to try to engage this machine in that new cluster.
I still see the same issue with the latest revision
unit-k8s-1: 10:34:55 ERROR unit.k8s/1.juju-log cos-tokens:0: Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-1/charm/./src/charm.py", line 744, in <module>
ops.main.main(K8sCharm)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/main.py", line 544, in main
manager.run()
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/main.py", line 520, in run
self._emit()
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/main.py", line 509, in _emit
_emit_charm_event(self.charm, self.dispatcher.event_name)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/main.py", line 143, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/framework.py", line 352, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/framework.py", line 851, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/ops/framework.py", line 941, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-k8s-1/charm/venv/charms/reconciler.py", line 35, in reconcile
self.reconcile_function(event)
File "/var/lib/juju/agents/unit-k8s-1/charm/./src/charm.py", line 570, in _reconcile
self._revoke_cluster_tokens(event)
File "/var/lib/juju/agents/unit-k8s-1/charm/./src/charm.py", line 369, in _revoke_cluster_tokens
self.distributor.revoke_tokens(
File "/var/lib/juju/agents/unit-k8s-1/charm/src/token_distributor.py", line 361, in revoke_tokens
token_strat(node, ignore_errors)
File "/var/lib/juju/agents/unit-k8s-1/charm/src/token_distributor.py", line 178, in _revoke_cluster_token
self.api_manager.remove_node(name)
File "/var/lib/juju/agents/unit-k8s-1/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 706, in remove_node
self._send_request(endpoint, "POST", EmptyResponse, body)
File "/var/lib/juju/agents/unit-k8s-1/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 651, in _send_request
raise InvalidResponseError(
charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 500
method=POST
endpoint=/1.0/k8sd/cluster/remove
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"node \"k8s-worker-1\" is not part of the cluster","metadata":null}
maksim@darmbeliaev:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
canonical-k8s k8s-machines-contoller localhost/localhost 3.4.2 unsupported 10:36:36+02:00
App Version Status Scale Charm Channel Rev Exposed Message
k8s error 1 k8s latest/edge 47 no hook failed: "cos-tokens-relation-created"
k8s-worker 1.30.0 active 2 k8s-worker latest/edge 47 no Ready
Unit Workload Agent Machine Public address Ports Message
k8s-worker/0* active idle 4 10.112.13.239 Ready
k8s-worker/1 active idle 5 10.112.13.65 Ready
k8s/1* error idle 6 10.102.2.2 hook failed: "cos-tokens-relation-created"
Machine State Address Inst id Base AZ Message
4 started 10.112.13.239 manual:10.112.13.239 ubuntu@22.04 Manually provisioned machine
5 started 10.112.13.65 manual:10.112.13.65 ubuntu@22.04 Manually provisioned machine
6 started 10.102.2.2 juju-d1d519-6 ubuntu@22.04 Running
try to reproduce on the local machine
Bug Description
Unit goes into unrecoverable error
To Reproduce
Environment
edge
Relevant log output
Additional context
No response