Support for mutable tags (like `latest`)

BernhardGruen commented 11 months ago

Current situation

Kuik saves the current version of an image in its registry and serves this copy as long as it is cached. This is a great feature from an availability perspective and works perfectly for immutable tags.

imagePullPolicy: Never

kuik works as expected (it is not used at all)

imagePullPolicy: IfNotPresent

kuik works as expected for immutable tags
kuik works as expected for mutable tags (no new version ever downloaded)

BUT: IfNotPresent is not useful for mutable tags.

imagePullPolicy: Always

kuik works as expected for immutable tags (as they never ever change again)
kuik does not work as expected for mutable tags (as they are expected to change) After all :latest is mutable too.

Sometimes mutable tags are needed

As written above mutable tags are useful and sometimes also needed in some environments (used together with imagePullPolicy: Always). It helps using the latest version of images (e.g. postgres:15) which might contain some relevant security fixes. And pulling that new image is as easy as killing a running pod.

If kuik is installed (and configured to be active for such images) however the situation is different. Once kuik has a cached version of an image it never gets an update (as long as the image is in active use and therefore not garbage collected).

So again: from an availability perspective kuik is great but from a security and also an usability perspective there are some pitfalls.

Examples to illustrate the current situation

If one deploys a statefulset for an image like postgres:15 (PostgreSQL database), kube-image-keeper will cache the postgres:15 image the moment a corresponding pod gets created. This exact image is stored inside the kuik registry. Now if postgres:15 gets an update, which might be important for security reasons, and a developer tries to upgrade the pods, the cached version will be used and it won't be updated to the newer, security fixed version of postgres:15. And that person has to watch the log outputs in depth to find out that there was no update.

For mutable tags like :latest the situation can be even worse as an developer assumes imagePullPolicy: Always. But unfortunately the image never gets an update in the future while kube-image-keeper is actively caching that image. This behavior is clearly completely different from the expected default behavior of imagePullPolicy: Always.

Avoiding single point of failures

One could argue that using imagePullPolicy: Always is bad anyways, because it produces a single point of failure (the image registry). But kube-image-keeper is able to solve this SPOF situation.

And therefore I would like to present an idea on how to improve kube-image-keeper's capabilities to solve this.

An idea how to fix those issues

The proxy component of kube-image-keeper could implement a mechanism that checks the upstream registry for updates and also the ability to re-download an update if one is available. The basic code already seems to be there as kuik checks and might also download an update if one manually deletes the CachedImage object.

Clearly there is a need to still use the already cached (and maybe outdated) version in some cases and this is what makes kuik outstanding:

If the upstream image registry is not reachable. (wisely set timeout needed)
If the image is not available (anymore) in the upstream registry.
If the provided imagePullSecrets are currently not working.

Different update modes might be possible: There could be several update options to decide whether kuik should check for image updates and it might also be possible to configure this during the kuik installation:

Always check for updates
Check for updates if the last check was some configurable minutes, hours or even days ago
Never check for updates (current behavior)

To be clear: Those checks should only be made if there is a request from the container runtime for that image anyway. This should not be a recurring background job. That way kuik does not do needlessly many calls to the upstream image registry.

Similar issue

A similar request was also made to the Kubernetes issue board in https://github.com/kubernetes/kubernetes/issues/111822 . There the author proposed a new imagePullPolicy named IfAvailable which would update an image if the image registry is available and the image itself has an update. Otherwise it would deliver the already present version. I think that using kube-image-keeper with this enhancements would solve the said problem and also improve the availability even further.

BernhardGruen commented 10 months ago

Hey,

what is your opinion for my idea? I really think that could increase the use cases of kube-image-keeper a lot.

BernhardGruen commented 9 months ago

Hey @Nicolasgouze,

you wrote in your Blog post (https://enix.io/en/blog/cache-image-docker-kubernetes/) that you are working on Improving the behavior around container images versioning management / set into the cache (e.g smoothly allowing to update a “latest” version, in the case it was updated in the source registry).

It seems you roughly describe a similar feature as I wrote about in this issue. Are there any future plans?

Nicolasgouze commented 9 months ago

Hi @BernhardGruen ,

Here are the basic info I can provide :

Yes, I rest assure that we still do plan to better manage the mutable tags scenario ! « Unfortunately », we had other prio until a couple of days back.
We’re currently in the design phase, doing our best to agree on a « robust » solution / limiting complexity, taking into account retain policy & GC features (other connected areas we want to improve) …
Finally, as short / middle term, we’ll not push for the « IfAvailable » option.

I’ll come back to you once we have something !

steffansluis commented 9 months ago

If I understand correctly I guess the simple workaround for now would be to label any pods that are going to be using mutable tags with kube-image-keeper.enix.io/image-caching-policy=ignore

Nicolasgouze commented 9 months ago

You're right @steffansluis. If you manage, at some extend, the deployment process, you should use the approach you depict. (Use the pod definition « re-write exclude » capabilities) BUT as it’s not always the case AND we do not consider convenient/logic the need to use such a workaround for this scenario, we work on an other solution / a dedicated feature.

felipewnp commented 3 weeks ago

Hi @Nicolasgouze

Any update on this?

enix / kube-image-keeper