giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Add ValidationAdmissionPolicy to enforce size limits for emptyDir volumes in new Pods #3635

Open nprokopic opened 3 months ago

nprokopic commented 3 months ago

Motivation

We still have a lot of alerts because pods are using emptyDir that fills up the disk space.

In every workload cluster we should include a ValidationAdmissionPolicy that enforces size limits for emptyDir volumes in new Pods.

ValidatingAdmissionPolicy is beta starting from Kubernetes v1.28 (beta also in v1.29), and it is a stable feature starting from Kubernetes v1.30.

Proposal

The proposal on how to include in our product is the following.

Phase 1 - optional feature behind a flag, warnings by default

Phase 2 - default required feature, warnings by default

Phase 3 - default required feature, deny Pods by default

A bit slower and less disruptive change rollout

If the above plan is too disruptive for customer workload, prolong it so Phase 1 is spread across v29 and v30 releases, Phase 2 is v31, and then final Phase 3 is starting from v32 release.

TODO

Outcome

For Customer

Internal

hervenicol commented 3 months ago

Sorry, I'm AFK for a while, and reviewing this from my phone, so there's even more chances than usual that I miss something obvious.

But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?

nprokopic commented 3 months ago

Sorry, I'm AFK for a while, and reviewing this from my phone, so there's even more chances than usual that I miss something obvious.

Enjoy your time off :)

But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?

Admission policy would check if emptyDir has sizeLimit set. emptyDir is created regularly, but it has max size to which it can grow.

e.g. from k8s docs, see sizeLimit in emptyDir volume:

apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: registry.k8s.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
  - name: cache-volume
    emptyDir:
      sizeLimit: 500Mi

This would not be bulletproof, as the disk can still get filled from other sources. Still, I believe this would help to improve the current state, as we could prevent issues where a single pod unexpectedly takes 50% or even 90% of kubelet disk space.

njuettner commented 3 months ago

There's an issue about best practices for Kyverno policies in WCs: https://github.com/giantswarm/giantswarm/issues/27203

We should align here.

nprokopic commented 3 months ago

I will also create a separate issue to look into local ephemeral storage that @mcharriere mentioned here https://github.com/giantswarm/giantswarm/issues/27019#issuecomment-1554179079.

hervenicol commented 3 months ago

But, how can an admission policy check for the size of an emptydir? Isn't it supposed to be empty at creation?

Admission policy would check if emptyDir has sizeLimit set. emptyDir is created regularly, but it has max size to which it can grow.

Oh, I see.

That's probably not a blocker for using sizeLimit, but rather something we should be aware of when testing and even after.