Open kingdonb opened 2 years ago
Should incorporate information from:
I was also struggling with it, this seems to work for me, no idea if it's right:
Edit: Maybe it works also without the allow-cert-manager-resolver-reverse
policy
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: webhook-receiver
namespace: flux-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt
spec:
rules:
- host: fluxwebhook.example
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: webhook-receiver
port:
number: 80
tls:
- hosts:
- fluxwebhook.example
secretName: webhook-receiver-https
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-cert-manager-resolver-reverse
namespace: cert-manager
spec:
podSelector:
matchLabels:
app.kubernetes.io/instance: cert-manager
egress:
- to:
- namespaceSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-cert-manager-resolver
namespace: "flux-system"
spec:
podSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
ingress:
- from:
- namespaceSelector:
matchLabels:
app.kubernetes.io/instance: cert-manager
@kingdonb Hi, any plans to update the existing documentation? I've spent a couple of hours today trying to find out why I can't issue a certificate for the webhook receiver ingress. I'm new to all things Kubernetes, and 504 Gateway Time-out
is not a very descriptive error, so it took some time before I reached this issue and realized it's about the network configuration.
For those struggling too, I ended up using this network policy (which simply allows all ingress traffic to http-solver pods in the flux-system
namespace):
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-cert-manager-http-solver
namespace: flux-system
spec:
policyTypes:
- Ingress
podSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
ingress:
- {}
I don’t know how the documentation can be update here (I’m not a contributor), but a simple remark with a link to this issue would probably be enough.
+1000 for this, got stuck with 504 for hours as well 🤦
Same here, this fixed my unresolved DNS within seconds :heart_eyes: Thanks @Klavionik !
The details in this issue, I finally got around to testing today. I was able to make use of all the networkpolicies described here and I think they should go in the docs. I would like to do one more test of cert-manager (to reproduce the original issue) before recommending this change, at least, because I have changed a lot of things, and I honestly think the NetworkPolicy is one of the last things I would have needed to check.
Like for example, the cert-manager challenge is hosted in a parent vcluster, can this policy still be used or does it need some modification then?
I would like to avoid letting perfect get in the way of making these docs actually good. The issue remains open to signify that we are definitely missing a few important use cases. I will try to do some recap and come back to this issue, so we close it.
Thanks for your patience, everyone, but it was harder than I thought to get to the point where the network policy ingress was the only thing preventing me from using a TLS protected webhook on my clusters in the home lab.
I used the suggested most-limited networkpolicy from @andi0b above, that limits traffic to only labeled cert-manager pods.
cert-manager allow-cert-manager-resolver-reverse app.kubernetes.io/instance=cert-manager 116m
flux-system allow-cert-manager-resolver acme.cert-manager.io/http01-solver=true 116m
This seems to do the trick, for me! Thanks very much everyone who contributed something to the report here.
Describe the bug
I wrote parts of these webhook receiver docs, in "Expose the webhook receiver" I mentioned cert-manager and proudly stated that you can use the annotations here, but declared it was out of scope to document the whole setup from end to end. I think this use case is common enough (and well-explored) that we should probably document the whole setup from end to end with at least one ingress controller.
And, well, it turns out that the difficulties you encounter when trying to use cert-manager with Flux receivers is interesting enough to be worth covering in docs đź‘Ť
I think that cert-manager is one use case, that may or may not be an issue unique to ingress-nginx, but we could document cert manager, and how to add a network policy permitting an Issuer to manage certificates for Ingress in the flux-system ns. I've been learning Traefik and while I think it might be easier to explain, and can accomplish TLS+LetsEncrypt without cert-manager and without a NetworkPolicy, I don't know if we want to expand this section of the docs any more than necessary.
I know a bit more about this stuff than what I knew when I first wrote this, but I think our community is likely to have better and broader experience than myself, so I started this issue where we can document any other important details that should get a mention in the docs, whether they are for any specific Ingress controller or related to specific CNI implementations that may have quirks one will need to be aware of and work around related to NetworkPolicy resources in Flux. đź‘Ť
Steps to reproduce
There should be enough detail in the docs to produce a working, secured public Ingress with respect to Flux's NetworkPolicy strategy and guidance around securing the Flux namespace.
Expected behavior
Let's add a note about how cert-manager changes the network requirements in the namespace as well as explaining how to appropriately permit cert-manager traffic when it's in use for certificate generation and renewal.
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
v0.24.1
Flux check
N/A
Git provider
No response
Container Registry provider
No response
Additional context
No response
Code of Conduct