fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.47k stars 599 forks source link

Ingress docs for Flux webhook receiver are missing important details #2240

Open kingdonb opened 2 years ago

kingdonb commented 2 years ago

Describe the bug

I wrote parts of these webhook receiver docs, in "Expose the webhook receiver" I mentioned cert-manager and proudly stated that you can use the annotations here, but declared it was out of scope to document the whole setup from end to end. I think this use case is common enough (and well-explored) that we should probably document the whole setup from end to end with at least one ingress controller.

And, well, it turns out that the difficulties you encounter when trying to use cert-manager with Flux receivers is interesting enough to be worth covering in docs đź‘Ť

I think that cert-manager is one use case, that may or may not be an issue unique to ingress-nginx, but we could document cert manager, and how to add a network policy permitting an Issuer to manage certificates for Ingress in the flux-system ns. I've been learning Traefik and while I think it might be easier to explain, and can accomplish TLS+LetsEncrypt without cert-manager and without a NetworkPolicy, I don't know if we want to expand this section of the docs any more than necessary.

I know a bit more about this stuff than what I knew when I first wrote this, but I think our community is likely to have better and broader experience than myself, so I started this issue where we can document any other important details that should get a mention in the docs, whether they are for any specific Ingress controller or related to specific CNI implementations that may have quirks one will need to be aware of and work around related to NetworkPolicy resources in Flux. đź‘Ť

Steps to reproduce

There should be enough detail in the docs to produce a working, secured public Ingress with respect to Flux's NetworkPolicy strategy and guidance around securing the Flux namespace.

Expected behavior

Let's add a note about how cert-manager changes the network requirements in the namespace as well as explaining how to appropriately permit cert-manager traffic when it's in use for certificate generation and renewal.

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

v0.24.1

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

kingdonb commented 2 years ago

Should incorporate information from:

andi0b commented 1 year ago

I was also struggling with it, this seems to work for me, no idea if it's right:

Edit: Maybe it works also without the allow-cert-manager-resolver-reverse policy

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: webhook-receiver
  namespace: flux-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  rules:
    - host: fluxwebhook.example
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: webhook-receiver
                port:
                  number: 80
  tls:
    - hosts:
        - fluxwebhook.example
      secretName: webhook-receiver-https
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-cert-manager-resolver-reverse
  namespace: cert-manager
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: cert-manager
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              acme.cert-manager.io/http01-solver: "true"

---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-cert-manager-resolver
  namespace: "flux-system"
spec:
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:              
              app.kubernetes.io/instance: cert-manager
Klavionik commented 1 year ago

@kingdonb Hi, any plans to update the existing documentation? I've spent a couple of hours today trying to find out why I can't issue a certificate for the webhook receiver ingress. I'm new to all things Kubernetes, and 504 Gateway Time-out is not a very descriptive error, so it took some time before I reached this issue and realized it's about the network configuration.

For those struggling too, I ended up using this network policy (which simply allows all ingress traffic to http-solver pods in the flux-system namespace):

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-cert-manager-http-solver
  namespace: flux-system
spec:
  policyTypes:
    - Ingress
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  ingress:
    - {}
andi0b commented 1 year ago

I don’t know how the documentation can be update here (I’m not a contributor), but a simple remark with a link to this issue would probably be enough.

fabn commented 1 year ago

+1000 for this, got stuck with 504 for hours as well 🤦

Sierra1011 commented 8 months ago

Same here, this fixed my unresolved DNS within seconds :heart_eyes: Thanks @Klavionik !

kingdonb commented 5 months ago

The details in this issue, I finally got around to testing today. I was able to make use of all the networkpolicies described here and I think they should go in the docs. I would like to do one more test of cert-manager (to reproduce the original issue) before recommending this change, at least, because I have changed a lot of things, and I honestly think the NetworkPolicy is one of the last things I would have needed to check.

Like for example, the cert-manager challenge is hosted in a parent vcluster, can this policy still be used or does it need some modification then?

I would like to avoid letting perfect get in the way of making these docs actually good. The issue remains open to signify that we are definitely missing a few important use cases. I will try to do some recap and come back to this issue, so we close it.

Thanks for your patience, everyone, but it was harder than I thought to get to the point where the network policy ingress was the only thing preventing me from using a TLS protected webhook on my clusters in the home lab.

I used the suggested most-limited networkpolicy from @andi0b above, that limits traffic to only labeled cert-manager pods.

cert-manager   allow-cert-manager-resolver-reverse   app.kubernetes.io/instance=cert-manager   116m
flux-system    allow-cert-manager-resolver           acme.cert-manager.io/http01-solver=true   116m

This seems to do the trick, for me! Thanks very much everyone who contributed something to the report here.