Closed phvalguima closed 5 months ago
I am not sure which path should we follow. I see some options:
peer-changed
to all the peersk8s-integrator
charm, that relates to all the units and listens to kubernetes events - unrelated to Juju's - and generates a -changed to all the units it is related toAnother alternative is to try forbidding evictions altogether.
We can define a PodDisruptionBudget
as follows:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: <name>
namespace: <model-name>
spec:
maxUnavailable: 0 ### <<<<-------- the relevant part
selector:
matchLabels:
app.juju.is/created-by: <app-name>
After applying the above and rerunning the same curl
command, I get now:
$ curl -k -H "Authorization: Bearer YYYYYYYYYYYYYYYYY" -H 'Content-type: application/json' https://.../api/v1/namespaces/test/pods/postgresql-k8s-2/eviction -d @evict.json
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Cannot evict pod as it would violate the pod's disruption budget.",
"reason": "TooManyRequests",
"details": {
"causes": [
{
"reason": "DisruptionBudget",
"message": "The disruption budget psql-pdb needs 3 healthy pods and has 3 currently"
}
]
},
"code": 429
}
That is a standard way of informing k8s the pods should not be "disrupted". I think that has another advantage besides avoiding eviction: we cannot guarantee that the volume will be available if the pod moves between nodes. That gives more assurance we will not cause unforeseen data copy in the event of an eviction.
Trying to remove a pod via Juju works fine:
$ juju remove-unit postgresql-k8s --num-units 1
scaling down to 2 units
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
test test-k8s-localhost test-k8s/localhost 4.0-beta1 unsupported 18:00:07+02:00
App Version Status Scale Charm Channel Rev Address Exposed Message
postgresql-k8s 14.7 active 3/2 postgresql-k8s 14/stable 73 10.152.183.82 no
s3-integrator active 1 s3-integrator stable 13 10.152.183.52 no
Unit Workload Agent Address Ports Message
postgresql-k8s/0* active executing 10.1.166.153 (config-changed) Primary
postgresql-k8s/1 active executing 10.1.166.151 (config-changed)
postgresql-k8s/2 active executing 10.1.166.155
Eventually renders:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
test test-k8s-localhost test-k8s/localhost 4.0-beta1 unsupported 18:03:07+02:00
App Version Status Scale Charm Channel Rev Address Exposed Message
postgresql-k8s 14.7 active 2 postgresql-k8s 14/stable 73 10.152.183.82 no
s3-integrator active 1 s3-integrator stable 13 10.152.183.52 no
Unit Workload Agent Address Ports Message
postgresql-k8s/0* active idle 10.1.166.153
postgresql-k8s/1 active idle 10.1.166.151
s3-integrator/0* active idle 10.1.166.154
This also effected the commercial system's production postgresql-k8s deployment channel=14/edge
rev=198
.
Latest 14/edge (rev. 241, 242) should fix this.
I am currently running on latest Juju (v4.0-beta, checked out directly from repo, but also seen happening with Juju 2.9) and PSQL from channel
14/stable
.Environment Setup
1x node microk8s with 1.27/stable, using classic confinement.
That deployment should render a pgbackrest.conf file:
Deployment looks as:
Reproducer
To reproduce it, it is necessary to first add a new node, in this case, used a VM from LXD:
Then, install microk8s v1.27 and cluster it, following: https://microk8s.io/docs/clustering
Confirm the new node is Ready:
kubectl get nodes
Then, confirm all nodes are set in the same node, with:
kubectl get po -n <model-name> -o wide
Select one of the nodes to be evicted and generate a query, following: https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/#calling-the-eviction-api
Generate a json with the pod name, in my case:
CURL k8s API with using the token and the other details above:
That will evict the pod to a new node and renders a wrong pgbackrest.conf:
Early Conclusions
postgresql-k8s/2
in juju statusFull logs from postgresql/2: https://pastebin.ubuntu.com/p/CqjbhxwZvh/