Kong / gateway-operator

Kubernetes Operator for Kong Gateways
Apache License 2.0
51 stars 15 forks source link

DataPlane: support `PodDisruptionBudget` #142

Closed pmalek closed 3 months ago

pmalek commented 10 months ago

Problem statement

Users might want to specify the allowed disruption budget for their DataPlane workloads to configure e.g. how many replicas can be down during an upgrade.

Proposed

Support enabling a spec field in DataPlane API which will deploy and managed PodDisruptionBudget targeting the DataPlane instances.

Acceptance criteria

sentinelleader commented 9 months ago

will this be implemented in 1.3 release ?

we recycle pods every week (as we recycle nodes every week) and sometimes we see that the aws karpenter can go aggressive due to lack of pdb and they recommend pdb to avoid too mucch disruption

kubectl get events -n c1b32c25-8557-410c-9ea9-a3c2ca174835
LAST SEEN   TYPE      REASON                   OBJECT                                                                       MESSAGE
13m         Normal    SuccessfulCreate         replicaset/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf544c8f   Created pod: dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6
11m         Normal    SuccessfulCreate         replicaset/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf544c8f   Created pod: dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49
11m         Normal    Evicted                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf57rbg6          Evicted pod
11m         Normal    Killing                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf57rbg6          Stopping container proxy
9m33s       Warning   FailedPreStopHook        pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf57rbg6          PreStopHook failed
11m         Normal    Scheduled                pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49          Successfully assigned c1b32c25-8557-410c-9ea9-a3c2ca174835/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49 to ip-172-31-169-129.eu-west-2.compute.internal
10m         Normal    Pulling                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49          Pulling image "kong/kong-gateway:3.5-ubuntu"
10m         Normal    Pulled                   pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49          Successfully pulled image "kong/kong-gateway:3.5-ubuntu" in 5.448527909s (5.448537218s including waiting)
10m         Normal    Created                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49          Created container proxy
10m         Normal    Started                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5kwl49          Started container proxy
13m         Normal    Scheduled                pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Successfully assigned c1b32c25-8557-410c-9ea9-a3c2ca174835/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6 to ip-172-31-117-74.eu-west-2.compute.internal
13m         Normal    Pulling                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Pulling image "kong/kong-gateway:3.5-ubuntu"
13m         Normal    TaintManagerEviction     pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Cancelling deletion of Pod c1b32c25-8557-410c-9ea9-a3c2ca174835/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6
13m         Normal    Pulled                   pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Successfully pulled image "kong/kong-gateway:3.5-ubuntu" in 5.349953019s (5.349973311s including waiting)
13m         Normal    Created                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Created container proxy
13m         Normal    Started                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5pd2p6          Started container proxy
13m         Normal    Evicted                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5vhcqp          Evicted pod
13m         Normal    Killing                  pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5vhcqp          Stopping container proxy
11m         Warning   FailedPreStopHook        pod/dataplane-7ae39c6e-3999-4b72-be5c-f996c01c2c3b-5qfvv-7bcf5vhcqp          PreStopHook failed
7m37s       Normal    SuccessfullyReconciled   targetgroupbinding/k8s-c1b32c25-dataplan-1e62c79245                          Successfully reconciled
7m37s       Normal    SuccessfullyReconciled   targetgroupbinding/k8s-c1b32c25-dataplan-9bdadb86a4                          Successfully reconciled
kubectl get events -n 04332fa3-6402-403e-9229-478e806b7d17
LAST SEEN   TYPE      REASON                   OBJECT                                                                       MESSAGE
15m         Normal    SuccessfulCreate         replicaset/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-558696fd57   Created pod: dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb
15m         Normal    Scheduled                pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb          Successfully assigned 04332fa3-6402-403e-9229-478e806b7d17/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb to ip-172-31-66-158.eu-west-2.compute.internal
15m         Normal    Pulling                  pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb          Pulling image "kong/kong-gateway:3.5-ubuntu"
15m         Normal    Pulled                   pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb          Successfully pulled image "kong/kong-gateway:3.5-ubuntu" in 6.267026563s (6.267046123s including waiting)
15m         Normal    Created                  pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb          Created container proxy
15m         Normal    Started                  pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869lcbdb          Started container proxy
15m         Normal    Evicted                  pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869smrgh          Evicted pod
15m         Normal    Killing                  pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869smrgh          Stopping container proxy
14m         Warning   FailedPreStopHook        pod/dataplane-c495bef5-76b1-4f6d-ac55-0e5b7e5245ec-tmmgc-55869smrgh          PreStopHook failed
12m         Normal    SuccessfullyReconciled   targetgroupbinding/k8s-04332fa3-dataplan-9019bdead5                          Successfully reconciled
12m         Normal    SuccessfullyReconciled   targetgroupbinding/k8s-04332fa3-dataplan-e740979ad0                          Successfully reconciled
pmalek commented 9 months ago

CloudGateways would ideally have this for April's GA. Hence we either ship this in 1.2 or right after in 1.3.

Slack thread: https://kongstrong.slack.com/archives/C04D2Q757RU/p1707998741598919

pmalek commented 3 months ago

@sentinelleader The PR with proposed API changes has been created: https://github.com/Kong/gateway-operator/pull/441.

The implementation will follow after this is merged.