Kubernetes 1.22 Challenge stuck at pending : Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'

jelel-fliss commented 2 years ago

**I am using Kubernetes 1.22 provided by Scaleway ( Kapsule ) https://www.scaleway.com/fr/kubernetes-kapsule/ and CertManager v1.6.1 and for consecutive days I have been trying to generate SSL certificates with Let's Encrypt, but the http01 challenge gets always stuck at 'pending' status with the error 404. After going to the CertManager documentation (https://cert-manager.io/docs/faq/acme/#got-404-status-code), i made sure that the domain is working and accessible thourgh the internet, I typed the path storek8s.igesa.it//.well-known/acme-challenge/ and I got a response from my browser, with the thumbprint code. Also the ACME solver pod is running smoothly.

This may help but when I curl the acme-challenge path, I get an empty response ( still with 200 status code with no error ), unlike the browser.

These are the issuer and the ingress configuration :

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  annotations:
    #certmanager.k8s.io/issuer: letsencrypt-prod
    cert-manager.io/issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "true"
    kubernetes.io/tls-acme: "true"
spec:
  ingressClassName: nginx
  rules:
    - host: storek8s.igesa.it
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ecom-fo-svc-populus
                port:
                  number: 80

  tls:
  - hosts:
    - storek8s.igesa.it
    secretName: storek8s-igesa-it

Issuer

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: myemail@example.com
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: scaleway-acme-secret
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          class: "nginx"

Cert Manager pods running in the cert-manager namespace

kubectl get pods -n cert-manager
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-55658cdf68-v2978             1/1     Running   0          3h54m
cert-manager-cainjector-967788869-lwvjr   1/1     Running   0          3h54m
cert-manager-webhook-6668fbb57d-j9j8x     1/1     Running   0          3h54m

Logs of the CertManager pod :

kubectl logs pod/cert-manager-55658cdf68-v2978 -n cert-manager
 "dnsName"="storek8s.igesa.it" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-tz6hj" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="storek8s-igesa-it-secret-h7w7s-3629794593-1072885531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E1206 20:23:33.487254       1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="storek8s.igesa.it" "resource_kind"="Challenge" "resource_name"="storek8s-igesa-it-secret-h7w7s-3629794593-1072885531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01

Description of the certificate resource generated by CertManager :

kubectl describe certs
Name:         storek8s-igesa-it
Namespace:    default
Labels:       <none>
Annotations:  acme.cert-manager.io/http01-override-ingress-name: nginx-ingress
              cert-manager.io/issue-temporary-certificate: true
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2021-12-06T20:35:46Z
  Generation:          1
  Managed Fields:
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:acme.cert-manager.io/http01-override-ingress-name:
          f:cert-manager.io/issue-temporary-certificate:
        f:ownerReferences:
          .:
          k:{"uid":"7af0141c-f1d6-4c5b-92c0-7a2d544109f9"}:
      f:spec:
        .:
        f:dnsNames:
        f:issuerRef:
          .:
          f:group:
          f:kind:
          f:name:
        f:secretName:
        f:usages:
    Manager:      controller
    Operation:    Update
    Time:         2021-12-06T20:35:46Z
    API Version:  cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:nextPrivateKeySecretName:
        f:notAfter:
        f:notBefore:
        f:renewalTime:
    Manager:      controller
    Operation:    Update
    Subresource:  status
    Time:         2021-12-06T20:35:47Z
  Owner References:
    API Version:           networking.k8s.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  nginx-ingress
    UID:                   7af0141c-f1d6-4c5b-92c0-7a2d544109f9
  Resource Version:        8706956582
  UID:                     9fc0d6e8-5ec8-400d-907b-4790d3d8cee5
Spec:
  Dns Names:
    storek8s.igesa.it
  Issuer Ref:
    Group:      cert-manager.io
    Kind:       Issuer
    Name:       letsencrypt-prod
  Secret Name:  storek8s-igesa-it
  Usages:
    digital signature
    key encipherment
Status:
  Conditions:
    Last Transition Time:        2021-12-06T20:35:46Z
    Message:                     Issuing certificate as Secret does not exist
    Observed Generation:         1
    Reason:                      DoesNotExist
    Status:                      True
    Type:                        Issuing
    Last Transition Time:        2021-12-06T20:35:47Z
    Message:                     Certificate is up to date and has not expired
    Observed Generation:         1
    Reason:                      Ready
    Status:                      True
    Type:                        Ready
  Next Private Key Secret Name:  storek8s-igesa-it-257jg
  Not After:                     2022-03-06T20:35:47Z
  Not Before:                    2021-12-06T20:35:47Z
  Renewal Time:                  2022-02-04T20:35:47Z
Events:
  Type    Reason     Age   From          Message
  ----    ------     ----  ----          -------
  Normal  Issuing    11m   cert-manager  Issuing certificate as Secret does not exist
  Normal  Generated  11m   cert-manager  Stored new private key in temporary Secret resource "storek8s-igesa-it-257jg"
  Normal  Requested  11m   cert-manager  Created new CertificateRequest resource "storek8s-igesa-it-t6dtt"
  Normal  Issuing    11m   cert-manager  Issued temporary certificate

Description of the challenge that keeps failing with the 404 status

kubectl describe challenges
Name:         storek8s-igesa-it-t6dtt-3629794593-1072885531
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1
Kind:         Challenge
Metadata:
  Creation Timestamp:  2021-12-06T20:35:48Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Managed Fields:
    API Version:  acme.cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizer.acme.cert-manager.io":
        f:ownerReferences:
          .:
          k:{"uid":"49a1a731-e949-4fb8-9e39-6efce1e8c403"}:
      f:spec:
        .:
        f:authorizationURL:
        f:dnsName:
        f:issuerRef:
          .:
          f:group:
          f:kind:
          f:name:
        f:key:
        f:solver:
          .:
          f:http01:
            .:
            f:ingress:
              .:
              f:name:
        f:token:
        f:type:
        f:url:
        f:wildcard:
    Manager:      controller
    Operation:    Update
    Time:         2021-12-06T20:35:48Z
    API Version:  acme.cert-manager.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:presented:
        f:processing:
        f:reason:
        f:state:
    Manager:      controller
    Operation:    Update
    Subresource:  status
    Time:         2021-12-06T20:35:49Z
  Owner References:
    API Version:           acme.cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  storek8s-igesa-it-t6dtt-3629794593
    UID:                   49a1a731-e949-4fb8-9e39-6efce1e8c403
  Resource Version:        8706957042
  UID:                     a86b3061-ea15-4044-85c5-ae9dd9ee6710
Spec:
  Authorization URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/1109653358
  Dns Name:           storek8s.igesa.it
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   Issuer
    Name:   letsencrypt-prod
  Key:      8XRP9hEe9uyIDUzQjBRSI2Ee0kO_n-LnkjnuPufLriw.LEz4NLDi6OSEcosv_N9ic7NSIIMEJk9DWuXl8h-IEWk
  Solver:
    http01:
      Ingress:
        Name:  nginx-ingress
  Token:       8XRP9hEe9uyIDUzQjBRSI2Ee0kO_n-LnkjnuPufLriw
  Type:        HTTP-01
  URL:         https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/1109653358/Rl-9PQ
  Wildcard:    false
Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'
  State:       pending
Events:
  Type    Reason     Age   From          Message
  ----    ------     ----  ----          -------
  Normal  Started    13m   cert-manager  Challenge scheduled for processing
  Normal  Presented  13m   cert-manager  Presented challenge using HTTP-01 challenge mechanism

Description of the Ingress resource.

kubectl get ingress
NAME            CLASS   HOSTS               ADDRESS                          PORTS     AGE
nginx-ingress   nginx   storek8s.igesa.it   163.172.151.251,212.47.232.218   80, 443   14m
PS C:\Users\lenovo\Desktop\MSS\New Ecommerce\Scaleway resources> kubectl describe ingress
Name:             nginx-ingress
Namespace:        default
Address:          163.172.151.251,212.47.232.218
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  storek8s-igesa-it terminates storek8s.igesa.it
Rules:
  Host               Path  Backends
  ----               ----  --------
  storek8s.igesa.it
                     /.well-known/acme-challenge/8XRP9hEe9uyIDUzQjBRSI2Ee0kO_n-LnkjnuPufLriw   cm-acme-http-solver-xw76j:8089 (100.64.4.43:8089)
                     /                                                                         ecom-fo-svc-populus:80 (100.64.3.116:4000,100.64.3.244:4000)
Annotations:         acme.cert-manager.io/http01-edit-in-place: true
                     cert-manager.io/issuer: letsencrypt-prod
                     kubernetes.io/tls-acme: true
Events:
  Type    Reason             Age                From                      Message
  ----    ------             ----               ----                      -------
  Normal  CreateCertificate  14m                cert-manager              Successfully created Certificate "storek8s-igesa-it.cert"
  Normal  CreateCertificate  13m                cert-manager              Successfully created Certificate "storek8s-igesa-it"
  Normal  DeleteCertificate  13m                cert-manager              Successfully deleted unrequired Certificate "storek8s-igesa-it.cert"
  Normal  Sync               13m (x4 over 14m)  nginx-ingress-controller  Scheduled for sync
  Normal  Sync               13m (x4 over 14m)  nginx-ingress-controller  Scheduled for sync

Making sure that the acme http solver pod is running :

kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
cm-acme-http-solver-467wd           1/1     Running   0          21m
fo-populus-56c4db7c4c-cdnm5         1/1     Running   0          3d5h
fo-populus-56c4db7c4c-q2rrk         1/1     Running   0          3d5h

It looks like the ACME http solver pod is not reachable as it's stuck in this one log :

kubectl logs cm-acme-http-solver-467wd
I1206 20:35:50.602838       1 solver.go:39] cert-manager/acmesolver "msg"="starting listener"  "expected_domain"="storek8s.igesa.it" "expected_key"="..." "expected_token"="..." "listen_port"=8089

Expected behaviour: CertManager generating SSL certificates and enabling HTTPS communication with my website.

Steps to reproduce the bug:

Preparing a cluster with the same Kubernetes version and CertManager version.
Creating the issuer and the ingress through the files I provided with a DNS pointing to your ingress Load Balancer

Anything else we need to know?: I tried deleting and installing CertManager, modifying the annotations of the ingress resource, going through different forums but despite this being a bug encountered often, it lacks troubleshooting clear details and steps.

Environment details::

Kubernetes version:
Cloud-provider/provisioner: Scaleway
cert-manager version: 1.6.1
Install method: static manifests ( kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml )

/kind bug

marstud31 commented 2 years ago

I had the same issue with ClusterIssuer and traefik. Have you tried to remove 'solver spec' because it's optional? But i have no clue, why this property is set in all of the sample code and why it isn't necessary.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: myemail@example.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress: { }

pcfens commented 2 years ago

I've typically found this means that your ingress controller isn't reading the ingress object, or some other ingress supersedes a path collision. I most recently ran into this issue because I didn't have an IngressClass object. I've also run into this issue for not having the proper annotations setup correctly in the (Cluster)Issuer (same thing as an IngressClass before the ingress spec was marked as stable)

sureshkachwa commented 2 years ago

@jelel-fliss I too have the same issue, cert-manager version is 1.6.1 and k8s cluster version is 1.20.1. For the first deployment certificate was issued but now for all the new deplyments challenge and order both are in pending state

jelel-fliss commented 2 years ago

Somehow, I was able to resolve this. I could not find the real cause of this problem but I tried restarting the nginx service in the ingress controller containers and suddenly, the certificates were issued successfully.

kubectl exec -it <ingress-controller-pod> nginx -s reload

sureshkachwa commented 2 years ago

I did the same but it didn't solve my problem, I still doubt, if it is something to do with ingress controller version or cert-manager version though same versions were able to issue an SSL cert for one of the namespace 10days back and in the same cluster today challenges for SSL are pending

jelel-fliss commented 2 years ago

@sureshkachwa did you try a different version of cert-manager ? Before reloading the nginx-controller, I downgraded to the version 1.5.0. It still did not work after recreating the issuer and the ingress object. Also there are some changes I made to the Ingress that may have helped solving the issue. Still, only reloading NGINX in the ingress triggered the success of issuing the certificates.

metadata:
  name: nginx-ingress
  annotations:
    kubernetes.io/ingress.allow-http: "true" # I added this
    kubernetes.io/ingress.class: nginx # I added this
    cert-manager.io/issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "true"
    kubernetes.io/tls-acme: "true"
spec:
  #ingressClassName: nginx # I commented this

I removed the ingressClassName field in spec

sureshkachwa commented 2 years ago

@sureshkachwa did you try a different version of cert-manager ? Before reloading the nginx-controller, I downgraded to the version 1.5.0. It still did not work after recreating the issuer and the ingress object. Also there are some changes I made to the Ingress that may have helped solving the issue. Still, only reloading NGINX in the ingress triggered the success of issuing the certificates.
metadata:
  name: nginx-ingress
  annotations:
    kubernetes.io/ingress.allow-http: "true" # I added this
    kubernetes.io/ingress.class: nginx # I added this
    cert-manager.io/issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "true"
    kubernetes.io/tls-acme: "true"
spec:
  #ingressClassName: nginx # I commented this
I removed the ingressClassName field in spec

@jelel-fliss , we are using helm to install ingress controller ,ingress controller is getting deployed with "webhook validation error" moreover I installed cert-manager 1.6.0 and nginx controller 4.0.8 but still the same issue, challenge is still pending,there is something that we need to work on, I'll see if I get a solution and share it here.

sureshkachwa commented 2 years ago

@jelel-fliss still same issue. I'll give a try with new k8s cluster and cert-manager with version 1.5.0, edit the ing object with above annotations and see if it helps.

jelel-fliss commented 2 years ago

@sureshkachwa can you share the logs of the ingress, certmanager pods and the events of the challenge ?

sureshkachwa commented 2 years ago

@jelel-fliss created a new cluster with cert-manager 1.6.0 and ingress controller 1.0.5. Right now I see the below thing in the cluster: NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE devops17 cm-acme-http-solver-886vl nginx devops17.nasty.orbitbi.com 129.154.226.113 80 6m45s devops17 devops17-orbit-ing devops17.nasty.orbitbi.com 129.154.226.113 80, 443 6m50s

Will wait for 15 more min and share all the logs here

sureshkachwa commented 2 years ago

@jelel-fliss logs for ing,cert-manager pods,acme pod, nginx controller is a huge file, can you share your email id I will share the logs

sureshkachwa commented 2 years ago

Now,I see a different error. cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://devops17.nasty.orbitbi.com/.well-known/acme-challenge/Qbso0LFq4LbjK_YUV7Rtyvpon7a6I_NIESsLjd_AfJc': Get \"http://devops17.nasty.orbitbi.com/.well-known/acme-challenge/Qbso0LFq4LbjK_YUV7Rtyvpon7a6I_NIESsLjd_AfJc\": dial tcp: lookup devops17.nasty.orbitbi.com on 10.96.5.5:53: no such host" "dnsName"="devops17.nasty.orbitbi.com" "resource_kind"="Challenge" "resource_name"="wildcard-orbit-prod-tls-mtr5s-3076010998-709024210" "resource_namespace"="devops17" "resource_version"="v1" "type"="HTTP-01"

jhbae200 commented 2 years ago

If the issuer http01 class is istio, the istio ingressClass resource is required.

Ignressclass yaml example

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
  namespace: istio-system
spec:
  controller: istio.io/ingress-controller

sureshkachwa commented 2 years ago

Issue resolved after pods started resolving the fqdn for what ever domain the certificate needs to be issued, we earlier had a setup in oracle cloud OCI where we wanted our resources in a specific VCN to look at VCN DNS for resolution and not public DNS, removed that specific zone setting in OCI and that helped.

ikewabo commented 2 years ago

If the issuer http01 class is istio, the istio ingressClass resource is required.

Ignressclass yaml example
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
  namespace: istio-system
spec:
  controller: istio.io/ingress-controller

This worked exceptionally! Resolved my issue.
Thank YOU!!!

egavard commented 2 years ago

Had the same problem with an ingressClassname: traefik-cert-manager.

I looked for IngressClass on my cluster and found it was just traefik.

I replaced it in the ClusterIssuer and the certificate has finally been issue properly !

HerrmannHinz commented 2 years ago

running on kubernetes 1.21.x on azure here. same issue. BUT only when i was trying to install grafana on a specific path like: monitoring.<FQDN>.io/grafana.

cert-manager:

  chart:
    spec:
      chart: cert-manager
      sourceRef:
        kind: HelmRepository
        name: jetstack
        namespace: unstable
      version: v1.7.0

nginx:

  chart:
    spec:
      chart: ingress-nginx
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: unstable
      version: 4.0.16

here is how i solved the issue for the moment:

edit the .well-known ingress which was created by cert-manager and added IngressClassName: nginx-external into the .spec section of the cm-* ingress - save.... and et voila the 404 changed to a 200.

its a temp fix but at least i could create the cert for the moment. just wanted to share my finding.

muka commented 2 years ago

In my case I had a duplicated ingress for foo.it which was overlapping with the one created by cert-manager

This my ingress excerpt

//...
 spec:
   ingressClassName: nginx
   rules:
     - host: foo.it // <-- remove
     - host: www.foo.it
   tls:
     - hosts:
         - foo.it // <-- remove
         - www.foo.it
       secretName: foo-tls

sgrzemski commented 2 years ago

If the issuer http01 class is istio, the istio ingressClass resource is required.

Ignressclass yaml example
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
  namespace: istio-system
spec:
  controller: istio.io/ingress-controller

Could you please elaborate on that? I have created this resource, started http01 verification, but the requests are still routed to the virtual service instead of the cert-manager service. I have created both ingressClassName and annotation manually, but no luck to get it passed.

jetstack-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

jetstack-bot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. Send feedback to jetstack. /close

jetstack-bot commented 2 years ago

@jetstack-bot: Closing this issue.

In response to [this](https://github.com/cert-manager/cert-manager/issues/4648#issuecomment-1257232230): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. >Send feedback to [jetstack](https://github.com/jetstack). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

gfragi commented 1 year ago

@sureshkachwa did you try a different version of cert-manager ? Before reloading the nginx-controller, I downgraded to the version 1.5.0. It still did not work after recreating the issuer and the ingress object. Also there are some changes I made to the Ingress that may have helped solving the issue. Still, only reloading NGINX in the ingress triggered the success of issuing the certificates.
metadata:
  name: nginx-ingress
  annotations:
    kubernetes.io/ingress.allow-http: "true" # I added this
    kubernetes.io/ingress.class: nginx # I added this
    cert-manager.io/issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "true"
    kubernetes.io/tls-acme: "true"
spec:
  #ingressClassName: nginx # I commented this
I removed the ingressClassName field in spec
@jelel-fliss , we are using helm to install ingress controller ,ingress controller is getting deployed with "webhook validation error" moreover I installed cert-manager 1.6.0 and nginx controller 4.0.8 but still the same issue, challenge is still pending,there is something that we need to work on, I'll see if I get a solution and share it here.

It worked for me, while we have clusterIssuer nginx. Thanks a lot!!

cert-manager / cert-manager

Kubernetes 1.22 Challenge stuck at pending : Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200' #4648