fluxcd / image-reflector-controller

GitOps Toolkit controller that scans container registries
https://fluxcd.io
Apache License 2.0
109 stars 67 forks source link

Admission webhook for ImagePolicy validation #69

Open relu opened 3 years ago

relu commented 3 years ago

We should create an admission webhook for validating the policies. This makes sense in the context of ImagePolicyChoice which should configure one and only one policy type. An image policy that lacks a valid ImagePolicyChoice should be considered invalid and fail to create in this situation.

stefanprodan commented 3 years ago

Kyverno can be used for this, example here: https://github.com/fluxcd/flux2-multi-tenancy/blob/main/infrastructure/kyverno-policies/flux-multi-tenancy.yaml

relu commented 3 years ago

Nice, I'm not familiar with Kyverno, but looks really good in the context of multi-tenancy configurations.

I would argue for doing this validation at the level of the controller codebase, in an actual admission webhook, since it's something I'd consider "core" logic, not sure if that makes sense... What I mean to say is that maybe we shouldn't rely on a 3rd party solution to enforce a validation policy for a resource that would render itself useless if an invalid spec is provided. I see Kyverno's policies as a layer above that "core" logic, meaning the controllers would run according to design just fine without it, lacking only the protections imposed at the organization management level.

stefanprodan commented 3 years ago

I'm for reusing a CNCF project than writing something from scratch. Dealing with admission webhooks is hard, we need to setup TLS certs signed by Kubernetes API or depend on cert-manager to install Flux. Having cert-manager as an install dependency is really messy as it creates a chicken-egg problem.

relu commented 3 years ago

I see your point, indeed it would probably be less of a hassle. I was more inclined in the direction of admission webhooks because of them being native to k8s, however, considering the drawbacks of having do deal with the certificate bundle, be it either through an init container or through cert-manager (which is trivial if you have it, but complicated otherwise), I suppose going the Kyverno route would make more sense, especially if there's already a use case for it in the context of supporting multi-tenancy configurations. Would this make Kyverno a hard dependency for Flux? If so, how do we manage it?

stefanprodan commented 3 years ago

I don't think we should rely on cert-manager for Flux certs, as this means users have to install cert-manager by hand before Flux. The problem with Kyverno is that it doesn't support CRD conversion hooks and we'll need that when we'll release a new API version. I think we should investigate how can we generate certs at Flux install time and how can users rotate those certs before they expire.

relu commented 3 years ago

Does that mean admission webhooks are still on the table?

For reference, I've found this article to answer the question about how certificate management for admission webhooks can be achieved. See the last part regarding certificate rotation (obviously it's a bit of a hurdle).

I agree a hard requirement for cert-manager would not be a good solution, but I think we should at least let users make use of it if they actually have it and maybe even recommend it, or otherwise, by default, rely on our potential built-in solution. ECK takes this approach, see their documentation on its configuration.

stefanprodan commented 3 years ago

but I think we should at least let users make use of it if they actually have it and maybe even recommend it

How could they have cert-manager running if we tell people to do GitOps?

Does that mean admission webhooks are still on the table?

Yes if we can figure out how to generate the certs at Flux install time without requiring users to install things with kubectl beforehand. Both Kyverno and OPA Gatekeeper are generating certs at startup, so there is a way to do this without forcing people into deploying cert-manager.

relu commented 3 years ago

How could they have cert-manager running if we tell people to do GitOps?

Hybrid configurations, such as using the terraform helm provider to install parts of the kubernetes extension components, cert-manager would be a good candidate for that. I'm not saying it's necessarily the right way, neither arguing around the conflicting instructions for users, but it can be a valid use case.

relu commented 3 years ago

I looked a bit into how Kyverno does it and it does seem to come with some overhead. I see they actually implemented means to support certificate expiration validation and re-issuing but it does not seem to be used actively anywhere, so I assume the responsibility lies on the user to rotate the certificate at the moment (delete the secret and restart kyverno?).

I also checked on how ECK is doing it and it seems their method is a bit more simplistic and they actually have covered active renewal using a webhook certificates controller which checks if the certificate needs to be renewed and handles it.

If we plan to make use of the same strategy, I think the right place for this implementation would be under pkg as a generic package that could be integrated and used in all other places.

hiddeco commented 3 years ago

This discussion is becoming more relevant to other controllers, now that I have discovered Artifactory is actively guiding people to use invalid Helm chart names. Something that could be blocked in a UX friendly manner using a validation webhook API in the controller, as it would require rule based validation on the HelmChart object for the Chart field based on the SourceRef.Kind that is configured.

stefanprodan commented 3 years ago

OPA has made available a Go module for generating and rotating webhook certs that works with controller-runtime https://github.com/open-policy-agent/cert-controller

relu commented 3 years ago

Quick update on this: I managed to get to it and wanted to see if cert-controller can be integrated. It looks like it's behind with controller-runtime (v0.6.3) and is incompatible with the current v0.8 version in use by our controllers. Not sure what's the right approach, in this case. Probably developing and maintaining our own shared module in pkg is the way to go. This would mean we have full control over any future controller-runtime updates, essentially preventing potential dep blockers from upstream.

souleb commented 2 years ago

Knative went down the path of having a custom controller to reconcile webhook certificates: https://github.com/knative/pkg/tree/5708c4c442327025bc26eed03813dc10946d5a50/webhook/certificates

stefanprodan commented 2 years ago

I'm running Tekton on my clusters and see that webhook crashing quite often....

Screenshot 2021-11-18 at 10 41 10