Open nprokopic opened 3 months ago
Sorry, I'm AFK for a while, and reviewing this from my phone, so there's even more chances than usual that I miss something obvious.
But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?
Sorry, I'm AFK for a while, and reviewing this from my phone, so there's even more chances than usual that I miss something obvious.
Enjoy your time off :)
But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?
Admission policy would check if emptyDir
has sizeLimit
set. emptyDir
is created regularly, but it has max size to which it can grow.
e.g. from k8s docs, see sizeLimit
in emptyDir
volume:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 500Mi
This would not be bulletproof, as the disk can still get filled from other sources. Still, I believe this would help to improve the current state, as we could prevent issues where a single pod unexpectedly takes 50% or even 90% of kubelet disk space.
There's an issue about best practices for Kyverno policies in WCs: https://github.com/giantswarm/giantswarm/issues/27203
We should align here.
I will also create a separate issue to look into local ephemeral storage that @mcharriere mentioned here https://github.com/giantswarm/giantswarm/issues/27019#issuecomment-1554179079.
But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?
Admission policy would check if
emptyDir
hassizeLimit
set.emptyDir
is created regularly, but it has max size to which it can grow.
Oh, I see.
That's probably not a blocker for using sizeLimit, but rather something we should be aware of when testing and even after.
Motivation
We still have a lot of alerts because pods are using
emptyDir
that fills up the disk space.In every workload cluster we should include a ValidationAdmissionPolicy that enforces size limits for emptyDir volumes in new Pods.
ValidatingAdmissionPolicy
is beta starting from Kubernetes v1.28 (beta also in v1.29), and it is a stable feature starting from Kubernetes v1.30.Proposal
The proposal on how to include in our product is the following.
Phase 1 - optional feature behind a flag, warnings by default
v1beta1
API version in k8s 1.29.Audit
andWarn
being the default values. More details here in k8s v1.29 docs about ValidatingAdmissionPolicy.Warn
action toDeny
on selected Giant Swarm clusters (e.g. all testing and stable-testing clusters).Phase 2 - default required feature, warnings by default
ValidatingAdmissionPolicy
by default.v1
API version in k8s 1.30, which is different from k8s v1.29.Audit
andWarn
being the default values. More details here in k8s v1.30 docs about ValidatingAdmissionPolicy.Audit
andDeny
.Phase 3 - default required feature, deny Pods by default
ValidatingAdmissionPolicy
by default.Audit
andDeny
being the default values.A bit slower and less disruptive change rollout
If the above plan is too disruptive for customer workload, prolong it so Phase 1 is spread across v29 and v30 releases, Phase 2 is v31, and then final Phase 3 is starting from v32 release.
TODO
Outcome
For Customer
emptyDir
usage is more under control.Internal
KubeletVolumeSpaceTooLow
alerts, which reduces cognitive load that on-call work is adding to our daily work.