canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.44k stars 770 forks source link

Cert-manager not working on MicroK8s - HTTP-01 challenges returning 404 #4702

Open johngrabner opened 2 hours ago

johngrabner commented 2 hours ago

Summary

I am facing issues with cert-manager on a clean MicroK8s installation. I'm trying to issue a Let's Encrypt certificate using HTTP-01 validation, but the validation process fails with a 404 error instead of the expected 200. Additionally, there are discrepancies in the documentation and ingress class usage, leading to confusion about proper configurations.

What Should Happen Instead?

Reproduction Steps

Ingress and routing:

k get ingressclass
NAME     CONTROLLER             PARAMETERS   AGE
nginx    k8s.io/ingress-nginx   <none>       113s
public   k8s.io/ingress-nginx   <none>       113s

Question: Why 2 class names for the same controller? Where can I find documentation of its purpose?

By default, ingress uses an internal IP address 10.x.

microk8s kubectl -n ingress get pod -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
nginx-ingress-microk8s-controller-cnhcw   1/1     Running   0          15m   10.1.96.68   webserver1   <none>           <none>

Since I am deploying a simple single node cluster, I will use hostport as per https://kubernetes.github.io/ingress-nginx/deploy/baremetal/#via-the-host-network

microk8s kubectl edit daemonset nginx-ingress-microk8s-controller -n ingress to add hostNetwork: True under the spec section.

Now the ingress is using my host's IP address.

microk8s kubectl -n ingress get pod -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
nginx-ingress-microk8s-controller-hbnp5   0/1     Running   0          1s    192.168.10.12   webserver1   <none>           <none>

The class appears to be public, but not sure what this loop back ip is:

k get ingress
NAME                 CLASS    HOSTS                                                                      ADDRESS     PORTS     AGE
ingress-before-ttl   public   ancient-script.org,www.ancient-script.org,ancient-script.org + 5 more...   127.0.0.1   80, 443   62m

Apply my ingress consistent with https://microk8s.io/docs/addon-cert-manager, i.e. top level "/" is Exact, while deeper routes are Prefix, leaving "/.well-known" free.

microk8s kubectl apply -f k8s/production/ingress-before-ttl-issued.yaml where this is:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-before-ttl
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-issuer"
spec:
  tls:
    - hosts:
        - ancient-script.org
        - www.ancient-script.org
      secretName: ancient-script-org-crt-secret 
  rules:
  - host: ancient-script.org
    http:
      paths:
      - path: /
        pathType: Exact
        backend:
          service:
            name: express-service
            port:
              number: 3000
  - host: www.ancient-script.org
    http:
      paths:
      - path: /
        pathType: Exact
        backend:
          service:
            name: express-service
            port:
              number: 3000
  - host: www.ancient-script.org
    http:
      paths:
      - path: /file
        pathType: Prefix
        backend:
          service:
            name: express-service
            port:
              number: 3000

My DNS points to a public IP address. This public address is on my PFsense router. This router port forwards port 80 and 443 to my internal lan address of 192.168.10.12.

I deploy a simple service: microk8s kubectl apply -f k8s/production/express-deployment.yaml

From my cell phone not connected wifi (ie routing from the cloud), I can see http://ancient-script.org. If I go to http://ancient-script.org/.well-known/123 I get a 404 spash screen from nginx. I assume this spash screen is from the nginx ingress and not pfsense.

Cert-manager:

Following instructions at https://microk8s.io/docs/addon-cert-manager

before doing anything, I validate everything is empty:

k describe certificates
error: the server doesn't have a resource type "certificates"

microk8s enable cert-manager The output includes a message that conflicts with https://microk8s.io/docs/addon-cert-manager in that kind: Issuer vs kind: ClusterIssuer and ingressClassName is nginx vs ingress: class: public.

============== Cert-manager is installed. As a next step, try creating an Issuer for Let's Encrypt by creating the following resource:

$ microk8s kubectl apply -f - <<EOF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: me@example.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: letsencrypt-account-key
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          ingressClassName: nginx

=====

While microk8s.io/docs/addon-cert-manager:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
 name: lets-encrypt
spec:
 acme:
   email: microk8s@example.com
   server: https://acme-v02.api.letsencrypt.org/directory
   privateKeySecretRef:
     # Secret resource that will be used to store the account's private key.
     name: lets-encrypt-priviate-key
   # Add a single challenge solver, HTTP01 using nginx
   solvers:
   - http01:
       ingress:
         class: public

```====

Now, I apply the following ClusterIssuer consistent with https://microk8s.io/docs/addon-cert-manager.

=====

apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-issuer spec: acme: email: xxxxxx@hotmail.com server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: name: jjg-issuer-account-key solvers:


Looking at logs appear to indicate solver not getting called

=====

k get pods --namespace cert-manager NAME READY STATUS RESTARTS AGE cert-manager-77fb85564-m6fg8 1/1 Running 0 14m cert-manager-cainjector-857964b486-m7bmx 1/1 Running 0 14m cert-manager-webhook-755d476bb8-mnp5m 1/1 Running 0 14m john@webserver1:/Disk3/Documents/GitHub/help-me-transcribe$ k logs cert-manager-77fb85564-m6fg8 -n cert-manager

I1011 00:19:41.793254 1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.challenges.http01.selfCheck.http01.ensureIngress" resource_name="ancient-script-org-crt-secret-1-1348440633-112979109" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-r924k" related_resource_namespace="default" related_resource_kind="" related_resource_version="" E1011 00:19:41.805132 1 sync.go:190] "propagation check failed" err="wrong status code '404', expected '200'" logger="cert-manager.challenges" resource_name="ancient-script-org-crt-secret-1-1348440633-112979109" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="ancient-script.org" type="HTTP-01" I1011 00:19:44.254110 1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.challenges.http01.selfCheck.http01.ensurePod" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-qqwt6" related_resource_namespace="default" related_resource_kind="" related_resource_version="" I1011 00:19:44.254167 1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.challenges.http01.selfCheck.http01.ensureService" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-tcwct" related_resource_namespace="default" related_resource_kind="" related_resource_version="" I1011 00:19:44.254214 1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.challenges.http01.selfCheck.http01.ensureIngress" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-wq5fd" related_resource_namespace="default" related_resource_kind="" related_resource_version=""

===============

Look solver does not have a class that match the primary ingress. 
Is this a problem, or are these ingress independent.

k describe ingress cm-acme-http-solver-r924k
Name:             cm-acme-http-solver-r924k
Labels:           acme.cert-manager.io/http-domain=1134561051
                  acme.cert-manager.io/http-token=783748635
                  acme.cert-manager.io/http01-solver=true
Namespace:        default
Address:          127.0.0.1
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host                Path  Backends
  ----                ----  --------
  ancient-script.org  
                      /.well-known/acme-challenge/Fr9Ys7h-k-zD_4mIy36H7Ga2f8eDaJnRyjCYDpqXiKU   cm-acme-http-solver-l4x42:8089 (10.1.96.75:8089)
Annotations:          kubernetes.io/ingress.class: public
                      nginx.ingress.kubernetes.io/whitelist-source-range: 0.0.0.0/0,::/0
Events:               <none>

k get ingress
NAME                        CLASS    HOSTS                                                                      ADDRESS     PORTS     AGE
cm-acme-http-solver-r924k   <none>   ancient-script.org                                                         127.0.0.1   80        18m
cm-acme-http-solver-wq5fd   <none>   www.ancient-script.org                                                     127.0.0.1   80        18m
ingress-before-ttl          public   ancient-script.org,www.ancient-script.org,ancient-script.org + 5 more...   127.0.0.1   80, 443   105m

=======

I have no idea if this is a good idea, but i path the 2 solvers

k patch ingress cm-acme-http-solver-r924k -p '{"spec": {"ingressClassName": "public"}}' k patch ingress cm-acme-http-solver-wq5fd -p '{"spec": {"ingressClassName": "public"}}'

k get ingress NAME CLASS HOSTS ADDRESS PORTS AGE cm-acme-http-solver-r924k public ancient-script.org 127.0.0.1 80 22m cm-acme-http-solver-wq5fd public www.ancient-script.org 127.0.0.1 80 22m ingress-before-ttl public ancient-script.org,www.ancient-script.org,ancient-script.org + 5 more... 127.0.0.1 80, 443 108m


=====
but logs still indicate solver not getting message

I```
1011 00:38:08.084476       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.challenges.http01.selfCheck.http01.ensureService" resource_name="ancient-script-org-crt-secret-1-1348440633-112979109" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-l4x42" related_resource_namespace="default" related_resource_kind="" related_resource_version=""
I1011 00:38:08.084519       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.challenges.http01.selfCheck.http01.ensureIngress" resource_name="ancient-script-org-crt-secret-1-1348440633-112979109" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-r924k" related_resource_namespace="default" related_resource_kind="" related_resource_version=""
E1011 00:38:08.100611       1 sync.go:190] "propagation check failed" err="wrong status code '404', expected '200'" logger="cert-manager.challenges" resource_name="ancient-script-org-crt-secret-1-1348440633-112979109" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="ancient-script.org" type="HTTP-01"
I1011 00:38:09.722905       1 pod.go:59] "found one existing HTTP01 solver pod" logger="cert-manager.challenges.http01.selfCheck.http01.ensurePod" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-qqwt6" related_resource_namespace="default" related_resource_kind="" related_resource_version=""
I1011 00:38:09.722966       1 service.go:45] "found one existing HTTP01 solver Service for challenge resource" logger="cert-manager.challenges.http01.selfCheck.http01.ensureService" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-tcwct" related_resource_namespace="default" related_resource_kind="" related_resource_version=""
I1011 00:38:09.723007       1 ingress.go:99] "found one existing HTTP01 solver ingress" logger="cert-manager.challenges.http01.selfCheck.http01.ensureIngress" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01" related_resource_name="cm-acme-http-solver-wq5fd" related_resource_namespace="default" related_resource_kind="" related_resource_version=""
E1011 00:38:09.833811       1 sync.go:190] "propagation check failed" err="wrong status code '404', expected '200'" logger="cert-manager.challenges" resource_name="ancient-script-org-crt-secret-1-1348440633-1471904646" resource_namespace="default" resource_kind="Challenge" resource_version="v1" dnsName="www.ancient-script.org" type="HTTP-01"

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

johngrabner commented 2 hours ago

inspection-report-20241010_195408.tar.gz