Closed amc94 closed 6 months ago
Hey @amc94
That happens because the juju leader node is not necessarily the the database leader/voter node. Thus, in the two node setup juju may remove the only dqlite voter node and not the follower node which keeps the cluster in a broken state. The obvious solution for this is to failover to the follower node if the last voter is removed. We already have a TODO for this https://github.com/canonical/k8s-snap/blob/main/src/k8s/pkg/client/dqlite/remove.go#L37 and will work on this in the upcoming sprint.
Hey @amc94
HI @bschimke95 I'm having difficulties bootstrapping the k8s charm on top of a maas cloud. When I try to bootstrap I see
unit-k8s-2: 16:10:38 INFO juju Starting unit workers for "k8s/2"
unit-k8s-2: 16:10:38 INFO juju.worker.apicaller [01e3d9] "unit-k8s-2" successfully connected to "10.246.165.170:17070"
unit-k8s-2: 16:10:38 INFO juju.worker.apicaller [01e3d9] password changed for "unit-k8s-2"
unit-k8s-2: 16:10:38 INFO juju.worker.apicaller [01e3d9] "unit-k8s-2" successfully connected to "10.246.165.170:17070"
unit-k8s-2: 16:10:38 INFO juju.worker.upgrader no waiter, upgrader is done
unit-k8s-2: 16:10:38 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-k8s-2: 16:10:38 INFO juju.worker.logger logger worker started
unit-k8s-2: 16:10:38 DEBUG juju.worker.dependency "metric-sender" manifold worker started at 2024-04-09 16:10:38.55295949 +0000 UTC
unit-k8s-2: 16:10:38 DEBUG juju.worker.dependency "uniter" manifold worker started at 2024-04-09 16:10:38.553329721 +0000 UTC
unit-k8s-2: 16:10:38 DEBUG juju.worker.secretsdrainworker got new secret backend
unit-k8s-2: 16:10:38 DEBUG juju.worker.secretsdrainworker no secrets to drain
unit-k8s-2: 16:10:38 DEBUG juju.worker.meterstatus got meter status change signal from watcher
unit-k8s-2: 16:10:38 DEBUG juju.machinelock acquire machine lock "machine-lock" for meterstatus (meter-status-changed)
unit-k8s-2: 16:10:38 DEBUG juju.machinelock machine lock "machine-lock" acquired for meterstatus (meter-status-changed)
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.runner starting jujuc server {unix @/var/lib/juju/agents/unit-k8s-2/agent.socket <nil>}
unit-k8s-2: 16:10:38 DEBUG juju.worker.apiaddressupdater updating API hostPorts to [[10.246.164.114:17070 127.0.0.1:17070 [::1]:17070] [10.246.165.174:17070 127.0.0.1:17070 [::1]:17070] [10.246.165.170:17070 127.0.0.1:17070 [::1]:17070]]
unit-k8s-2: 16:10:38 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
unit-k8s-2: 16:10:38 DEBUG juju.machinelock created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5
unit-k8s-2: 16:10:38 DEBUG juju.machinelock machine lock "machine-lock" released for meterstatus (meter-status-changed)
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter starting local juju-exec listener on {unix /var/lib/juju/agents/unit-k8s-2/run.socket <nil>}
unit-k8s-2: 16:10:38 INFO juju.worker.uniter unit "k8s/2" started
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter juju-exec listener running
unit-k8s-2: 16:10:38 INFO juju.worker.uniter resuming charm install
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.operation running operation install ch:amd64/jammy/k8s-27 for k8s/2
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.operation no machine lock needed for k8s/2
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.operation preparing operation "install ch:amd64/jammy/k8s-27" for k8s/2
unit-k8s-2: 16:10:38 INFO juju.worker.uniter.charm downloading ch:amd64/jammy/k8s-27 from API server
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.operation executing operation "install ch:amd64/jammy/k8s-27" for k8s/2
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.charm preparing to deploy charm "ch:amd64/jammy/k8s-27"
unit-k8s-2: 16:10:38 DEBUG juju.worker.uniter.charm deploying charm "ch:amd64/jammy/k8s-27"
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.charm finishing deploy of charm "ch:amd64/jammy/k8s-27"
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.operation committing operation "install ch:amd64/jammy/k8s-27" for k8s/2
unit-k8s-2: 16:10:39 INFO juju.worker.uniter hooks are retried true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate starting secrets rotation watcher
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate starting secret revisions expiry watcher
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate starting obsolete secret revisions watcher
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.secretsrotate got rotate secret changes: []watcher.SecretTriggerChange{}
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got unit change for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.secretrevisionsexpire got revision expiry secret changes: []watcher.SecretTriggerChange{}
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got update status interval change for k8s/2: ok=true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got storage change for k8s/2: [] ok=true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got obsolete secret revisions change for k8s: []
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate obsolete secret revisions: map[]
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate deleted secrets: []
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got application change for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got secrets change for k8s/2: []
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got latest secret info: map[string]secrets.SecretRevisionInfo{}
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate deleted secrets: []
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate obsolete secrets: map[]
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got address change for k8s/2: ok=true, hashes=[02a44cd9aa7c3d65311f6116ab7e00367f885f6d90449058a589404ed6beee9d]
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got action change for k8s/2: [] ok=true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got relations change for k8s/2: ok=true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got config change for k8s/2: ok=true, hashes=[ecc8c064fc5d3a506b16fc17be2641816dbc54a7120152758d18e33ea225f67a]
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got leader settings change for k8s/2: ok=true
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got upgrade series change
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate no upgrade series in progress, reinitializing local upgrade series state
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.remotestate got trust config change for k8s/2: ok=true, hashes=[6e01b14c257801c5ecce384a27f90588133952c5465677d535175b4287ccea48]
unit-k8s-2: 16:10:39 INFO juju.worker.uniter.storage initial storage attachments ready
unit-k8s-2: 16:10:39 INFO juju.worker.uniter found queued "install" hook
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.operation running operation run install hook for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.operation acquiring machine lock for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.machinelock acquire machine lock "machine-lock" for k8s/2 uniter (run install hook)
unit-k8s-2: 16:10:39 DEBUG juju.machinelock machine lock "machine-lock" acquired for k8s/2 uniter (run install hook)
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.operation preparing operation "run install hook" for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.operation executing operation "run install hook" for k8s/2
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter [AGENT-STATUS] executing: running install hook
unit-k8s-2: 16:10:39 DEBUG juju.worker.uniter.runner starting jujuc server {unix @/var/lib/juju/agents/unit-k8s-2/agent.socket <nil>}
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log ops 2.12.0 up and running.
unit-k8s-2: 16:10:39 INFO unit.k8s/2.juju-log Running legacy hooks/install.
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log ops 2.12.0 up and running.
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log Charm called itself via hooks/install.
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log Legacy hooks/install exited with status 0.
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log Using local storage: not a Kubernetes podspec charm
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log Initializing SQLite local storage: /var/lib/juju/agents/unit-k8s-2/charm/.unit-state.db.
unit-k8s-2: 16:10:39 DEBUG unit.k8s/2.juju-log Emitting Juju event install.
unit-k8s-2: 16:10:39 INFO unit.k8s/2.juju-log Applying Proxied Environment Settings
unit-k8s-2: 16:10:39 INFO unit.k8s/2.juju-log Ensuring k8s snap version
unit-k8s-2: 16:10:48 INFO unit.k8s/2.juju-log Applying K8s requirements
unit-k8s-2: 16:10:48 INFO unit.k8s/2.juju-log Check if k8ds is ready
unit-k8s-2: 16:10:48 DEBUG unit.k8s/2.juju-log Found expected exception: Error connecting to socket: /var/snap/k8s/common/var/lib/k8sd/state/control.socket
unit-k8s-2: 16:10:48 ERROR unit.k8s/2.juju-log Caught ReconcilerError
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 438, in connect
self.sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 101, in on_error
yield
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 208, in _check_k8sd_ready
self.api_manager.check_k8sd_ready()
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 659, in check_k8sd_ready
self._send_request(endpoint, "GET", EmptyResponse)
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 550, in _send_request
connection.request(
File "/usr/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/usr/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 440, in connect
raise K8sdConnectionError(f"Error connecting to socket: {self.unix_socket}") from e
charms.k8s.v0.k8sd_api_manager.K8sdConnectionError: Error connecting to socket: /var/snap/k8s/common/var/lib/k8sd/state/control.socket
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/reconciler.py", line 35, in reconcile
self.reconcile_function(event)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 472, in _reconcile
self._check_k8sd_ready()
File "/usr/lib/python3.10/contextlib.py", line 78, in inner
with self._recreate_cm():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 106, in on_error
raise ReconcilerError(msg) from e
charms.contextual_status.ReconcilerError: Found expected exception: Error connecting to socket: /var/snap/k8s/common/var/lib/k8sd/state/control.socket
unit-k8s-2: 16:10:48 INFO unit.k8s/2.juju-log Status context closed with: [WaitingStatus('Waiting for k8sd')]
unit-k8s-2: 16:10:48 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation committing operation "run install hook" for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.machinelock created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5
unit-k8s-2: 16:10:48 DEBUG juju.machinelock machine lock "machine-lock" released for k8s/2 uniter (run install hook)
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation lock released for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.relation unit "k8s/2" (leader=true) entered scope for relation "k8s:cos-tokens"
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.relation unit "k8s/2" (leader=true) entered scope for relation "k8s:cluster"
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation running operation run relation-created (5; app: k8s) hook for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation acquiring machine lock for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.machinelock acquire machine lock "machine-lock" for k8s/2 uniter (run relation-created (5; app: k8s) hook)
unit-k8s-2: 16:10:48 DEBUG juju.machinelock machine lock "machine-lock" acquired for k8s/2 uniter (run relation-created (5; app: k8s) hook)
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation preparing operation "run relation-created (5; app: k8s) hook" for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.operation executing operation "run relation-created (5; app: k8s) hook" for k8s/2
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter [AGENT-STATUS] executing: running cos-tokens-relation-created hook
unit-k8s-2: 16:10:48 DEBUG juju.worker.uniter.runner starting jujuc server {unix @/var/lib/juju/agents/unit-k8s-2/agent.socket <nil>}
unit-k8s-2: 16:10:49 DEBUG unit.k8s/2.juju-log cos-tokens:5: ops 2.12.0 up and running.
unit-k8s-2: 16:10:49 DEBUG unit.k8s/2.juju-log cos-tokens:5: Emitting Juju event cos_tokens_relation_created.
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Applying Proxied Environment Settings
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Ensuring k8s snap version
unit-k8s-2: 16:10:49 WARNING unit.k8s/2.cos-tokens-relation-created snap "k8s" has no updates available
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Applying K8s requirements
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Check if k8ds is ready
unit-k8s-2: 16:10:49 ERROR unit.k8s/2.juju-log cos-tokens:5: Invalid response while checking if cluster is bootstrapped: HTTP or Socket error method=GET
endpoint=/1.0/k8sd/cluster
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Using dqlite as datastore
unit-k8s-2: 16:10:49 DEBUG unit.k8s/2.juju-log cos-tokens:5: Found expected exception: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to run post-bootstrap actions: failed to unmarshal bootstrap config: failed to unmarshal bootstrap config: invalid character 'c' looking for beginning of value","metadata":null}
unit-k8s-2: 16:10:49 ERROR unit.k8s/2.juju-log cos-tokens:5: Caught ReconcilerError
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 101, in on_error
yield
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 238, in _bootstrap_k8s_snap
self.api_manager.bootstrap_k8s_snap(payload)
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 671, in bootstrap_k8s_snap
self._send_request(endpoint, "POST", EmptyResponse, body)
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 559, in _send_request
raise InvalidResponseError(
charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to run post-bootstrap actions: failed to unmarshal bootstrap config: failed to unmarshal bootstrap config: invalid character 'c' looking for beginning of value","metadata":null}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/reconciler.py", line 35, in reconcile
self.reconcile_function(event)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 474, in _reconcile
self._bootstrap_k8s_snap()
File "/usr/lib/python3.10/contextlib.py", line 78, in inner
with self._recreate_cm():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 106, in on_error
raise ReconcilerError(msg) from e
charms.contextual_status.ReconcilerError: Found expected exception: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to run post-bootstrap actions: failed to unmarshal bootstrap config: failed to unmarshal bootstrap config: invalid character 'c' looking for beginning of value","metadata":null}
unit-k8s-2: 16:10:49 INFO unit.k8s/2.juju-log cos-tokens:5: Status context closed with: [WaitingStatus('Waiting to bootstrap k8s snap')]
unit-k8s-2: 16:10:49 INFO juju.worker.uniter.operation ran "cos-tokens-relation-created" hook (via hook dispatching script: dispatch)
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation committing operation "run relation-created (5; app: k8s) hook" for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.machinelock created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5
unit-k8s-2: 16:10:50 DEBUG juju.machinelock machine lock "machine-lock" released for k8s/2 uniter (run relation-created (5; app: k8s) hook)
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation lock released for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation running operation run relation-created (4; app: k8s) hook for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation acquiring machine lock for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.machinelock acquire machine lock "machine-lock" for k8s/2 uniter (run relation-created (4; app: k8s) hook)
unit-k8s-2: 16:10:50 DEBUG juju.machinelock machine lock "machine-lock" acquired for k8s/2 uniter (run relation-created (4; app: k8s) hook)
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation preparing operation "run relation-created (4; app: k8s) hook" for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.operation executing operation "run relation-created (4; app: k8s) hook" for k8s/2
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter [AGENT-STATUS] executing: running cluster-relation-created hook
unit-k8s-2: 16:10:50 DEBUG juju.worker.uniter.runner starting jujuc server {unix @/var/lib/juju/agents/unit-k8s-2/agent.socket <nil>}
unit-k8s-2: 16:10:50 DEBUG unit.k8s/2.juju-log cluster:4: ops 2.12.0 up and running.
unit-k8s-2: 16:10:50 DEBUG unit.k8s/2.juju-log cluster:4: Emitting Juju event cluster_relation_created.
unit-k8s-2: 16:10:50 INFO unit.k8s/2.juju-log cluster:4: Applying Proxied Environment Settings
unit-k8s-2: 16:10:50 INFO unit.k8s/2.juju-log cluster:4: Ensuring k8s snap version
unit-k8s-2: 16:10:50 WARNING unit.k8s/2.cluster-relation-created snap "k8s" has no updates available
unit-k8s-2: 16:10:50 INFO unit.k8s/2.juju-log cluster:4: Applying K8s requirements
unit-k8s-2: 16:10:50 INFO unit.k8s/2.juju-log cluster:4: Check if k8ds is ready
unit-k8s-2: 16:10:50 ERROR unit.k8s/2.juju-log cluster:4: Invalid response while checking if cluster is bootstrapped: Error status 500
method=GET
endpoint=/1.0/k8sd/cluster
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"failed to create k8s client: failed to build Kubernetes REST config: stat /etc/kubernetes/admin.conf: no such file or directory","metadata":null}
unit-k8s-2: 16:10:50 INFO unit.k8s/2.juju-log cluster:4: Using dqlite as datastore
unit-k8s-2: 16:10:50 DEBUG unit.k8s/2.juju-log cluster:4: Found expected exception: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to initialize local remote entry: A remote with name \"solqa-lab1-server-10\" already exists","metadata":null}
unit-k8s-2: 16:10:50 ERROR unit.k8s/2.juju-log cluster:4: Caught ReconcilerError
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 101, in on_error
yield
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 238, in _bootstrap_k8s_snap
self.api_manager.bootstrap_k8s_snap(payload)
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 671, in bootstrap_k8s_snap
self._send_request(endpoint, "POST", EmptyResponse, body)
File "/var/lib/juju/agents/unit-k8s-2/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 559, in _send_request
raise InvalidResponseError(
charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to initialize local remote entry: A remote with name \"solqa-lab1-server-10\" already exists","metadata":null}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/reconciler.py", line 35, in reconcile
self.reconcile_function(event)
File "/var/lib/juju/agents/unit-k8s-2/charm/./src/charm.py", line 474, in _reconcile
self._bootstrap_k8s_snap()
File "/usr/lib/python3.10/contextlib.py", line 78, in inner
with self._recreate_cm():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/var/lib/juju/agents/unit-k8s-2/charm/venv/charms/contextual_status.py", line 106, in on_error
raise ReconcilerError(msg) from e
charms.contextual_status.ReconcilerError: Found expected exception: Error status 500
method=POST
endpoint=/cluster/control
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"Failed to initialize local remote entry: A remote with name \"solqa-lab1-server-10\" already exists","metadata":null}
the remote solqa-lab1-server-10 is the node the k8s charm was deployed to.
The following is the ouput journal for the k8s
ubuntu@solqa-lab1-server-10:~$ journalctl -u snap.k8s.k8sd.service
Apr 09 16:10:48 solqa-lab1-server-10 systemd[1]: Started Service for snap application k8s.k8sd.
Apr 09 16:10:48 solqa-lab1-server-10 k8s.k8sd[6024]: + exec /snap/k8s/213/bin/k8sd --state-dir=/var/snap/k8s/common/var/lib/k8sd/state
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=warning msg="Failed to parse new remotes from truststore"
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=warning msg="microcluster database is uninitialized"
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=warning msg="Failed to parse new remotes from truststore"
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=info msg=" - binding control socket" socket=/var/snap/k8s/common/var/lib/k8sd/state/control.socket
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: 2024/04/09 16:10:49 http: panic serving @: runtime error: invalid memory address or nil pointer dereference
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: goroutine 35 [running]:
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: net/http.(*conn).serve.func1()
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/net/http/server.go:1868 +0xb9
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: panic({0x21fe7a0?, 0x3a978b0?})
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/runtime/panic.go:920 +0x270
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/go-dqlite/app.(*App).clientOptions(...)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/go-dqlite@v1.21.0/app/app.go:724
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/go-dqlite/app.(*App).Leader(0x28f14d0?, {0x28f1540?, 0xc00027c000?})
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/go-dqlite@v1.21.0/app/app.go:488 +0x22
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/microcluster/internal/db.(*DB).Leader(...)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/microcluster@v0.0.0-20240402231055-e8e97999a2eb/internal/db/dqlite.go:176
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/microcluster/internal/state.(*State).Leader(0xc00024b960)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/microcluster@v0.0.0-20240402231055-e8e97999a2eb/internal/state/state.go:125 +0xa6
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/k8s/pkg/k8sd/api/impl.GetClusterMembers({0x28f14d0, 0xc000402ff0}, 0x7f8966947108?)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /build/k8s/parts/k8s-binaries/build/pkg/k8sd/api/impl/k8sd.go:17 +0x35
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/k8s/pkg/k8sd/api.(*Endpoints).getClusterStatus(0x412885?, 0xc00024b960, 0x20a3120?)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /build/k8s/parts/k8s-binaries/build/pkg/k8sd/api/cluster.go:18 +0x5c
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/microcluster/internal/rest.handleAPIRequest({0xc00012d640?, 0xc00012d650?, 0x20?, 0x31?}, 0x28cb760?, {0x2284ce0?, 0x3afff08?}, 0xc00070>
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/microcluster@v0.0.0-20240402231055-e8e97999a2eb/internal/rest/rest.go:64 +0x1c8
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/canonical/microcluster/internal/rest.HandleEndpoint.func1({0x28e23a0, 0xc000712000}, 0xc00070c200)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/canonical/microcluster@v0.0.0-20240402231055-e8e97999a2eb/internal/rest/rest.go:212 +0x64f
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: net/http.HandlerFunc.ServeHTTP(0xc00070c100?, {0x28e23a0?, 0xc000712000?}, 0x7f891fcc6cc0?)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/net/http/server.go:2136 +0x29
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc000001740, {0x28e23a0, 0xc000712000}, 0xc00070c000)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /root/go/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212 +0x1c5
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: net/http.serverHandler.ServeHTTP({0xc000700060?}, {0x28e23a0?, 0xc000712000?}, 0x6?)
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/net/http/server.go:2938 +0x8e
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: net/http.(*conn).serve(0xc00020ad80, {0x28f1498, 0xc000700000})
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/net/http/server.go:2009 +0x5f4
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: created by net/http.(*Server).Serve in goroutine 52
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: /snap/go/10584/src/net/http/server.go:3086 +0x5cb
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=info msg=" - binding https socket" network="10.246.166.252:6400"
Apr 09 16:10:49 solqa-lab1-server-10 k8s.k8sd[6024]: time="2024-04-09T16:10:49Z" level=info msg="Preparing statements for Go project \"k8s-binaries\""
This looks like the issue we see in https://github.com/canonical/k8s-snap/issues/321. I'll investigate.
@amc94 with https://github.com/canonical/k8s-snap/pull/353 this should be fixed. Please try again.
it's fixed, thank you
I initially deployed 3 units of k8s then used juju to remove two, this led the last remaining k8s unit to be stuck with status waiting and the message 'waiting on k8sd'
from systemctl
snap.k8s.k8sd.service loaded active running Service for snap application k8s.k8sd
from the logs of k8sd