cert-manager / cert-manager

Automatically provision and manage TLS certificates in Kubernetes
https://cert-manager.io
Apache License 2.0
12.19k stars 2.1k forks source link

Controller can't handle hitting request rate limits of zerossl ACME API #5867

Open hnicke opened 1 year ago

hnicke commented 1 year ago

Describe the bug:

We've been using cert-manager with zerossl as ACME provider using http01 challenges for several months now vey successfully. However, since a couple of weeks ago, zerossl must have changed their ACME API: They now introduced a quite strict request rate limit. Whenever issuing a new certificate containing 3 or more domains and using the http01 challenge, we are running in 429 responses from their API, which completely bricks the cert issue flow. Note: The problem does not occur when issuing a cert containing <=2 domains.

Expected behaviour: The controller should respect 429 responses and try again later. In my case, retrying 2-3 seconds later would already solve the issue.

Steps to reproduce the bug: This is the certificate resource:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    service: tls-cert
  labels:
    service: tls-cert
  name: tls-cert
spec:
  dnsNames:
  - xxx
  - xxx
  - xxx
  - xxx
  - xxx
  - xxx
  - xxx
  - xxx
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: zerossl
  secretName: tls-cert
  usages:
  - digital signature
  - key encipherment

And this is the ClusterIssuer resource:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: zerossl
spec:
  acme:
    externalAccountBinding:
      keyID: xxxxx
      keySecretRef:
        key: eab-hmac-key
        name: zerossl
    privateKeySecretRef:
      name: zerossl-account
    server: https://acme.zerossl.com/v2/DV90
    solvers:
    - http01:
        ingress:
          class: nginx

After applying the certificate to the cluster, the corresponding CertificateRequest, Order, and Challenge resources are created as expected. However, during processing of the challenges, the ACME client hits the request limit of the zerossl API: challenge

# failed challenge status:
status:
  presented: false
  processing: false
  reason: 'Failed to retrieve Order resource: 429 : 429 Too Many Requests'
  state: errored

Once the first challenge fails, the error state is propagated to the Order and Certificate resource:

# Order status:
status:
  authorizations:
    ....
  failureTime: "2023-03-16T10:26:15Z"
  finalizeURL: https://acme.zerossl.com/v2/DV90/order/xxxxx/finalize
  reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
    Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
  state: errored
  url: https://acme.zerossl.com/v2/DV90/order/xxxxx

# Certificate status:
status:
  conditions:
  - lastTransitionTime: "2023-03-16T10:26:08Z"
    message: Issuing certificate as Secret does not exist
    observedGeneration: 1
    reason: DoesNotExist
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-03-16T10:26:15Z"
    message: "The certificate request has failed to complete and will be retried:
      Failed to wait for order resource \"tls-cert-twhmq-1698200363\" to become ready:
      order is in \"errored\" state: Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429
      Too Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
    observedGeneration: 1
    reason: Failed
    status: "False"
    type: Issuing
  failedIssuanceAttempts: 1
  lastFailureTime: "2023-03-16T10:26:15Z"

Anything else we need to know?:

It seems that for every challenge, the order is retrieved from the acme API. The more domains in the certificate, the more challenges are being spawned, and thus the more requests to fetch the order object are being made.

I see two technical issues here:

I have informed the technical support of zerossl about this issue. Their suggestion was to throttle the requests and/or implement a retry.

Environment details::

irbekrm commented 1 year ago

we are running in 429 responses from their API, which completely bricks the cert issue flow

It does not stop the issuance flow, it retries with exponential backoff https://cert-manager.io/docs/release-notes/release-notes-1.8/#exponential-backoff-after-a-failed-issuance We can't retry immediately to avoid pinging the ACME server too much both to avoid overwhelming it and to help with user rate limits, the recommendation from the ACME server (LE) is to retry with backoff in these cases

hnicke commented 1 year ago

Thank you for clarification. However it seems that the exponential backoff does not help in this case: When the controller retries again, it will definitely fail again due to 429 responses. This matches with my observation that our failed certificates do not recover by themselves, not even after multiple days. The root problem seems to be the amount of request that are being sent to the ACME server at once.

One way to solve this would be to cache the order entity instead of requesting it multiple times at the same time from the acme server.

irbekrm commented 1 year ago

Failed to retrieve Order resource: 429 : 429 Too Many Requests

In my experience this should mean something along the lines of 'too many certificate requests (have been created)' not 'the existing order has been requested too many times'. I don't believe that there would be a limit in how many times a simple GET request would be sent to the ACME server if it does not create new resources.

irbekrm commented 1 year ago

I cannot find any information about rate limits for ZeroSSL, so you might want to reach out to them. Here are the LetsEncrypt rate limits for example https://letsencrypt.org/docs/rate-limits/

hnicke commented 1 year ago

Thank you for your response. I am well aware of the letsencrypt rate limits. They are unsuitable for our use case, hence we moved to zerossl. In their basic plan (which we use), the amount of certificates are not limited in any way.

In my experience this should mean something along the lines of 'too many certificate requests (have been created)' not 'the existing order has been requested too many times'.

I have reached out to the zerossl support, they have confirmed that a) our account is by no means limited in terms of certificates b) they have implemented general rate limiting on their API.

Therefore I am 100% sure the error message is not in regards to the amount of certificates in general, but in regards to the amount of requests being sent per second. FYI, as mentioned earlier I have no problem issuing new certs with <=2 domains, even directly after hitting the rate limit with a certificate which has >2 domains.

I have played around with the failed resources and manually resetted their .status.state field to pending. Directly afterwards the challenge was conducted successfully.

I have invested some time to build a workaround using shell-operator to reset the .status.state fields automatically whenever it has errored and the message contained 429. While this most of the time works, monkey-patching the cert-manager resources seems to be a bad practice. I failed to get it working smoothly since obviously cert-manager should be the only controller altering the resource state. However, I derive from this experiment that my assumption is right: If cert-manager would simply retry fetching the order resource, or issue the requests in a slightly staggered fashion, or use a cached response, the problem would be solved. The new request rate limit in the zerossl api seems to be set up in a way which blocks request spikes to the same resource over a short period of times (i.e., couple of seconds).

irbekrm commented 1 year ago

I have reached out to the zerossl support, they have confirmed that

Could you then confirm with them what limits there actually are and for what requests?

irbekrm commented 1 year ago

Ideally they should have documented it somewhere.

hnicke commented 1 year ago

I'll reach out to them and ask for specifics :+1:

irbekrm commented 1 year ago

Thank you!

hnicke commented 1 year ago

Update from zerossl support: They are looking deeper into this. It might be a technical issue on their side, after all.

irbekrm commented 1 year ago

Thanks for reaching out to them and for the heads up! Keen to hear if this gets resolved.

afeiszli commented 1 year ago

@hnicke any updates?

hnicke commented 1 year ago

I've been writing back and forth with the zerossl support. Unfortunately it looks like they are not interested in understanding the problem nor helping me with the issue.

They keep telling me take a look at their documentation. The only related information given is not specific, so it doesn't help at all:

Configure your scripts and clients to use our free of charge ACME API in a meaningful way. We want to provide a reliable and stable service to all our customers, malicious users can be limited or even blocked.

I've been specifically asking for more information about rate limiting.

The gist of their answer, from oldest to newest:

Regarding this, there are no rate limits for your account per se but there are some limits we have to adhere to on our end, to prevent flooding / too many requests. In this case, I'd advise staggering your requests over a longer period of time as well as retrying these if they keep falling when they fail after 15 minutes or so.

[...] as the limit is for the whole endpoint, you may get these [429 responses] from time to time when our service is under particularly heavy load. Retrying is the only way to go for now. I am discussing the matter with our developers so I'll let you know if we have any news from our side.

Some news on the matter - we are investigating it more closely with our developers. It seems that there might be a problem we were not aware of on our side as well.

We have looked into the issue with our developers. The HTTP 429 code is a response from our infrastructure, indicating that our ACME endpoint is receiving more requests than we can currently process. The limit for the whole endpoint is variable and we adjust it periodically, to keep up with demand. As the ACME API is provided free of charge - some users unfortunately abuse the endpoint in order to issue huge amounts of certificates, which unfortunately has an effect on legitimate users such as you. Abusive users are of course blocked to free up capacity. In practical terms, the way to deal with this is to retry later. If you are having issues over an extended period of time (Few days or a week) and have retried it lots of times without success, please let us know.

At this point, I have given up on the zerossl support. I don't think they will fix the issue on their side. It's a shame though: zerossl is otherwise a perfect match for cert-manager. I wanted it to be the go-to provider whenever the cert rate limits of letsencrypt don't suffice. In case anyone knows a viable alternative to zerossl, please let me know.

I'm wondering whether using DNS challenges instead of HTTP-challenges would help. Does anyone know if using DNS challenges would send less requests to the order endpoint?

FischlerA commented 1 year ago

I've been writing back and forth with the zerossl support. Unfortunately it looks like they are not interested in understanding the problem nor helping me with the issue.

They keep telling me take a look at their documentation. The only related information given is not specific, so it doesn't help at all:

Configure your scripts and clients to use our free of charge ACME API in a meaningful way. We want to provide a reliable and stable service to all our customers, malicious users can be limited or even blocked.

I've been specifically asking for more information about rate limiting.

The gist of their answer, from oldest to newest:

Regarding this, there are no rate limits for your account per se but there are some limits we have to adhere to on our end, to prevent flooding / too many requests. In this case, I'd advise staggering your requests over a longer period of time as well as retrying these if they keep falling when they fail after 15 minutes or so.

[...] as the limit is for the whole endpoint, you may get these [429 responses] from time to time when our service is under particularly heavy load. Retrying is the only way to go for now. I am discussing the matter with our developers so I'll let you know if we have any news from our side.

Some news on the matter - we are investigating it more closely with our developers. It seems that there might be a problem we were not aware of on our side as well.

We have looked into the issue with our developers. The HTTP 429 code is a response from our infrastructure, indicating that our ACME endpoint is receiving more requests than we can currently process. The limit for the whole endpoint is variable and we adjust it periodically, to keep up with demand. As the ACME API is provided free of charge - some users unfortunately abuse the endpoint in order to issue huge amounts of certificates, which unfortunately has an effect on legitimate users such as you. Abusive users are of course blocked to free up capacity. In practical terms, the way to deal with this is to retry later. If you are having issues over an extended period of time (Few days or a week) and have retried it lots of times without success, please let us know.

At this point, I have given up on the zerossl support. I don't think they will fix the issue on their side. It's a shame though: zerossl is otherwise a perfect match for cert-manager. I wanted it to be the go-to provider whenever the cert rate limits of letsencrypt don't suffice. In case anyone knows a viable alternative to zerossl, please let me know.

I'm wondering whether using DNS challenges instead of HTTP-challenges would help. Does anyone know if using DNS challenges would send less requests to the order endpoint?

we are using DNS Challenges and are facing the same issue

irbekrm commented 1 year ago

Thank you for the thorough feedback and checking with ZeroSSL @hnicke

I had another look at the cert-manager controller that needs to retrieve orders. Generally, we do try to not ping the ACME server where it's not needed, so the GET call is at the point where we are about to perform some actions that actually need the latest ACME order resource status, see here. But those actions are all conditional, so we could add an if-statement just before retrieving the order and return in case if neither of those conditions are true. That would help if it somehow is the case that you basically hit this point multiple times and that's where the majority of the GET calls come from.

To verify whether this would help, would it be possible to get some cert-manager controller logs from when the issuance starts (before you hit the 429 errors till you start hitting them) with debug log level (--v=5 flag to cert-manager controller)?

hnicke commented 1 year ago

Here are the logs with logLevel: 5: cert-manager.log During this specific attempt, the order itself and one of the challenges failed due to 429.

No action taken was hit only once.

irbekrm commented 1 year ago

Thank you for the logs.

Just to clarify, this is for a Certificate with a single DNS name?

I see that the order is only attempted to be retrieved twice (see "msg"="Calling GetOrder" log lines). The first attempt is actually unnecesary (you can see the No action taken for it) and the second results in a 429 error code. So I guess this is any calls to ACME server (from a particular client?) and not just calls to GET the order resource specifically unless that's limited to 1 call which seems too strict.

irbekrm commented 1 year ago

FWIW these are all the calls to ACME from that log

irbe@cert-manager$ cat cert-manager.log | grep "Calling"
I0329 12:56:55.690444       1 logger.go:45] cert-manager/acme-middleware "msg"="Calling AuthorizeOrder"
I0329 12:56:56.105084       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.229605       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.387722       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.497522       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.615755       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.733963       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.834454       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:56.928337       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:57.033450       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.033525       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.033583       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.034307       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.034535       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.034672       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.034822       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.035029       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.107530       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:57.178587       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.178725       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.178828       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.178895       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.178943       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.178994       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.179051       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.179475       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.181082       1 logger.go:51] cert-manager/acme-middleware "msg"="Calling GetOrder"
I0329 12:56:57.272425       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.272540       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.272624       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.272810       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.272924       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.273083       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.273196       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.273390       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 12:56:57.273550       1 logger.go:51] cert-manager/acme-middleware "msg"="Calling GetOrder"
I0329 12:56:58.128626       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.153202       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.180086       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.212308       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.259463       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.345306       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:56:58.466327       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 12:57:27.645644       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:27.759014       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:28.958473       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:29.060989       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:29.071487       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:29.115818       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:29.131838       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:29.317900       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:29.318998       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:29.447969       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:43.133586       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:43.257753       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 12:57:44.702882       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 12:57:44.904816       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
irbekrm commented 1 year ago

Out of interest, would you be able to share that Certificate and Order resource?

hnicke commented 1 year ago

The applied certificate related to the above log file:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    service: tls-cert
  labels:
    service: tls-cert
  name: tls-cert
spec:
  dnsNames:
  - platform.hnicke.example.com
  - bom.hnicke.example.com
  - app.hnicke.example.com
  - admin.api.hnicke.example.com
  - client.api.hnicke.example.com
  - hooks.api.hnicke.example.com
  - bex-os.api.hnicke.example.com
  - kafka.hnicke.example.com
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: zerossl
  secretName: tls-cert
  usages:
  - digital signature
  - key encipherment

(i have replaced the original domain with example.com in the above cert definition as well as in the logs for privacy reasons)

You are most probably right; it's not only about calls to orders, but other calls as well. I was mislead by the Failed to retrieve Order resource status.reason of the failed challenges. It turns out this reason is set in a quite generic handler and should probably be reworded.

hnicke commented 1 year ago

Out of interest, would you be able to share that Certificate and Order resource? Unfortunately I have already cleaned up the resources.

Therefore I've rerun the issuing attempt once again - please bear with me. cert-manager-2.log

Calls to acme:

I0329 13:48:50.387799       1 logger.go:45] cert-manager/acme-middleware "msg"="Calling AuthorizeOrder"
I0329 13:48:51.401597       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:51.487777       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:51.625177       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:51.748974       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:51.884927       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.031337       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.204002       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.357503       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.494362       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.495448       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.495515       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.495578       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.495787       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.496006       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.496205       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.496325       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.637217       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.637478       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.637691       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.637843       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.638015       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.638545       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.638742       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.638865       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:52.639534       1 logger.go:51] cert-manager/acme-middleware "msg"="Calling GetOrder"
I0329 13:48:52.863366       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.885424       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.902747       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.925592       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:52.964412       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:53.103981       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.104164       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.104322       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.105230       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.105748       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.106890       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.107416       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.107622       1 logger.go:117] cert-manager/acme-middleware "msg"="Calling HTTP01ChallengeResponse"
I0329 13:48:53.108095       1 logger.go:51] cert-manager/acme-middleware "msg"="Calling GetOrder"
I0329 13:48:53.396806       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:53.438184       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:48:53.469846       1 logger.go:93] cert-manager/acme-middleware "msg"="Calling GetAuthorization"
I0329 13:49:24.070503       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 13:49:24.073221       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 13:49:24.186702       1 logger.go:81] cert-manager/acme-middleware "msg"="Calling Accept"
I0329 13:49:24.302485       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 13:49:24.395204       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"
I0329 13:49:24.424769       1 logger.go:99] cert-manager/acme-middleware "msg"="Calling WaitAuthorization"

order:

apiVersion: acme.cert-manager.io/v1
kind: Order
metadata:
  annotations:
    cert-manager.io/certificate-name: tls-cert
    cert-manager.io/certificate-revision: "1"
    cert-manager.io/private-key-secret-name: tls-cert-8smsk
    service: tls-cert
  creationTimestamp: "2023-03-29T13:48:50Z"
  generation: 1
  labels:
    service: tls-cert
  name: tls-cert-z5ggl-1698200363
  namespace: cert-test-182
  ownerReferences:
  - apiVersion: cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CertificateRequest
    name: tls-cert-z5ggl
    uid: 8c58d6d6-929c-4aed-80e4-26109b8a83cd
  resourceVersion: "180360269"
  uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
spec:
  dnsNames:
  - platform.hnicke.example.com
  - bom.hnicke.example.com
  - app.hnicke.example.com
  - admin.api.hnicke.example.com
  - client.api.hnicke.example.com
  - hooks.api.hnicke.example.com
  - bex-os.api.hnicke.example.com
  - kafka.hnicke.example.com
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: zerossl
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJRFVEQ0NBamdDQVFBd0FEQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQUwweAo3TG8rV05QMU4xTHhaekJNWlI5aGVVTUtJL0tTWVhIUG9FMFRqMzY0dHlLakU2ODA4a0pzRk5LRnh3bU0xSnluCjVjdklHUjlxOG1FTnAzMGlQWGFWdTc3eVY0NkhSZm9NUXF1aWxhRE94Slp0eWxxNHFCR01zNFg4cFBNZlBsTWgKNVIxbjJNR0JvMFdoY0ZMbGFtcXhPdlFsSHZBZkNJNEFpZFdiempwdXN6VzVKczFpc21UOWVISUFYQU1Lb3RlRAorMXArQ3lRbnpJMVNRekdIMlhRbmE0NUY3ajhXd2g2Z0ZzbDhUNzUvU1d1WWU1ZWl0eEdobGFFbENJYVM5Umx2CkRzWmJiMW4wWU9GZ00remdJWGhSTXdMaU11VVEwWUFNaHdwQVd4UG9EcWY5WDZ2Sm93YWxnOW5iaFh2RElhTmIKQXNpSDdOdjQvQVF5Sy8zMkpOY0NBd0VBQWFDQ0FRa3dnZ0VGQmdrcWhraUc5dzBCQ1E0eGdmY3dnZlF3Z2VRRwpBMVVkRVFTQjNEQ0IyWUlhY0d4aGRHWnZjbTB1YUc1cFkydGxMbUpsZUhSbGMzUXVaR1dDRldKdmJTNW9ibWxqCmEyVXVZbVY0ZEdWemRDNWtaWUlWWVhCd0xtaHVhV05yWlM1aVpYaDBaWE4wTG1SbGdodGhaRzFwYmk1aGNHa3UKYUc1cFkydGxMbUpsZUhSbGMzUXVaR1dDSEdOc2FXVnVkQzVoY0drdWFHNXBZMnRsTG1KbGVIUmxjM1F1WkdXQwpHMmh2YjJ0ekxtRndhUzVvYm1samEyVXVZbVY0ZEdWemRDNWtaWUljWW1WNExXOXpMbUZ3YVM1b2JtbGphMlV1ClltVjRkR1Z6ZEM1a1pZSVhhMkZtYTJFdWFHNXBZMnRsTG1KbGVIUmxjM1F1WkdVd0N3WURWUjBQQkFRREFnV2cKTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFCRGwxWmtUeVFnRzdDa2ZmbGlnTUFMZ3gvUFZkeC8vWWtSRk9VbQpEclRIc1BDWmE1K2RWWXM4cXhUZ1F2RDI2RCs3K1lNL1ppUWdpVHJQdmlpbjJ4aU9McVBhWVQwU2NHeTNWZDhtCi95T3drRDZPRDg2K0lmbUdpaXQwUDdlRnJ5dXZ2T3cxQWIva1pSSDJEN0pUczR2djlsM1N6TVpBMGt2alNLZHEKNzQ1aDdCdWg3c05zd2dtclpUeVB2eHFYVmMxOHk0bWRxSXZOL0R0QjcrbmNmZHhjMFBpMTdRaUFoQzVPN1hrdwpXaGF3bld5VEg0NmtkYUVFSDNaK3VjUWE2OU03WG9RQ3FMTGtpYnBYQXVEdDlOVFA5d3k3N2lram1uazJOZWFhCkhzQXYvVjZlbzhBRDM3a3pVMjVGTHJueXJDNGZXZCtzMzNkNUZsSThMb255QUVjUwotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
status:
  authorizations:
  - challenges:
    - token: d0pEer1UvIllfUEXWo8nU_lsrHJZFWOE0_LbLDGVIRY
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/wBqAM3YwQWwfuTdwLDA-NQ
    - token: lWbZUOHIm-BJnKQSC21QvhkFsO3hidJqCDVuk7aAdAI
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/BKnJ7uX9iZJ-zXy8ISld5w
    identifier: admin.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/GxucS6VHBO1381RicLzMVA
    wildcard: false
  - challenges:
    - token: vTrV0ItGeI8JHu6970wwPEFo9RqB7b6QIi41EqVEBak
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/gzAEZGspcDTCnQVzZOV6_w
    - token: 8FeE1UOHjs0oEd-M-urcAVQShY2dCay8uybFjbxYZI0
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/AWrU7OqojlKWfUjNMnVuAw
    identifier: app.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/Mi5vVYx7KCrYp678lXwOOg
    wildcard: false
  - challenges:
    - token: 2tfP7UmP5L66Ilm965xvGKi_8SJq9T8Jmd9HokVhKdE
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/3HO9RmuPpfwoNhX3cGkZ8A
    - token: Pd9btc9zFl2kbCzRzzhBcLKX3rZmgMoMBmrM1_FSPD8
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/QjZrvSCTe-afn4mNYZRmPA
    identifier: bex-os.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/-ACvd9fDwy_LynIkH_FZeA
    wildcard: false
  - challenges:
    - token: wvDWMWeMs00dMBgjMlr9c7k18RnGOyPZJhHeUAGwSxA
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/tQxqrVBkPO3HKA6JhaDeWw
    - token: pE9RsxxpFSnb8jnqVYyqwTIhqEzKRuz1TjV5AJW2mBk
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/mE5StuzrvI11sXuuoFikcA
    identifier: bom.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/j34H9Kvs1r6cZalEzAmCKg
    wildcard: false
  - challenges:
    - token: nm52ANs3BIoJTvqr7WHU27CpiPIyJY5upl8zYT3IM3k
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/v7YsPtZTmm5PbMg16iw2TA
    - token: GgmDxF4Z_XKVNXf-DQ21-gx_Rbw9Ki6hvlufOLMRI9E
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/ijD3vf92SPDAd_m3lauqNg
    identifier: client.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/UP2zcs5x-r2xmFJCeSzUBQ
    wildcard: false
  - challenges:
    - token: aAUTcmrylThy6zNEzDy5DvskZyvfyA-DLJ44m_ocUEI
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/n0pik-suLaAeFc1krF3J_A
    - token: FmAGe1IhsSQT4dHQSLJTtrkxfoxO292CLYZ8WLO9_bY
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/asba81XGsZYfoPgLv5wxlQ
    identifier: hooks.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/TM9EdKmmhMmUQ5oQZYGobg
    wildcard: false
  - challenges:
    - token: CfnwWt0qyG8fkuKw5fehQ1FBAskCX7vCtShA9dEhvbY
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/AQralLA_b7aWH1BiDDV-zQ
    - token: MbEl0c-qgDW9FEr7p6PQxpm5zLsKg9-dwAeGvJoTLso
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/ETd2Xa62FyQtk2VE1i5SFA
    identifier: kafka.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/IyK462TIPswkJzbs7jF8xw
    wildcard: false
  - challenges:
    - token: 3hUcuKc0D3wbqiVrr_3jdRvr48utpsPAKOH8ntaGexM
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/oR5HRVRAHAbnu_TfyesDnw
    - token: aKRH-qHEp9piS_j_cB2OEqQsEM_OZWd2TdCuIDvoV8M
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/Zl0J3jose2lGXD4IcCkLgg
    identifier: platform.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/QpJEK_ddmhnu3TXBd7e0ww
    wildcard: false
  failureTime: "2023-03-29T13:48:53Z"
  finalizeURL: https://acme.zerossl.com/v2/DV90/order/CKKIEdMBPUQ8BKN9hB95Tw/finalize
  reason: 'Failed to retrieve Order resource: 429 : 429 Too Many Requests'
  state: errored
  url: https://acme.zerossl.com/v2/DV90/order/CKKIEdMBPUQ8BKN9hB95Tw

certificate:

apiVersion: acme.cert-manager.io/v1
kind: Order
metadata:
  annotations:
    cert-manager.io/certificate-name: tls-cert
    cert-manager.io/certificate-revision: "1"
    cert-manager.io/private-key-secret-name: tls-cert-8smsk
    service: tls-cert
  creationTimestamp: "2023-03-29T13:48:50Z"
  generation: 1
  labels:
    service: tls-cert
  name: tls-cert-z5ggl-1698200363
  namespace: cert-test-182
  ownerReferences:
  - apiVersion: cert-manager.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CertificateRequest
    name: tls-cert-z5ggl
    uid: 8c58d6d6-929c-4aed-80e4-26109b8a83cd
  resourceVersion: "180360269"
  uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
spec:
  dnsNames:
  - platform.hnicke.example.com
  - bom.hnicke.example.com
  - app.hnicke.example.com
  - admin.api.hnicke.example.com
  - client.api.hnicke.example.com
  - hooks.api.hnicke.example.com
  - bex-os.api.hnicke.example.com
  - kafka.hnicke.example.com
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: zerossl
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJRFVEQ0NBamdDQVFBd0FEQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQUwweAo3TG8rV05QMU4xTHhaekJNWlI5aGVVTUtJL0tTWVhIUG9FMFRqMzY0dHlLakU2ODA4a0pzRk5LRnh3bU0xSnluCjVjdklHUjlxOG1FTnAzMGlQWGFWdTc3eVY0NkhSZm9NUXF1aWxhRE94Slp0eWxxNHFCR01zNFg4cFBNZlBsTWgKNVIxbjJNR0JvMFdoY0ZMbGFtcXhPdlFsSHZBZkNJNEFpZFdiempwdXN6VzVKczFpc21UOWVISUFYQU1Lb3RlRAorMXArQ3lRbnpJMVNRekdIMlhRbmE0NUY3ajhXd2g2Z0ZzbDhUNzUvU1d1WWU1ZWl0eEdobGFFbENJYVM5Umx2CkRzWmJiMW4wWU9GZ00remdJWGhSTXdMaU11VVEwWUFNaHdwQVd4UG9EcWY5WDZ2Sm93YWxnOW5iaFh2RElhTmIKQXNpSDdOdjQvQVF5Sy8zMkpOY0NBd0VBQWFDQ0FRa3dnZ0VGQmdrcWhraUc5dzBCQ1E0eGdmY3dnZlF3Z2VRRwpBMVVkRVFTQjNEQ0IyWUlhY0d4aGRHWnZjbTB1YUc1cFkydGxMbUpsZUhSbGMzUXVaR1dDRldKdmJTNW9ibWxqCmEyVXVZbVY0ZEdWemRDNWtaWUlWWVhCd0xtaHVhV05yWlM1aVpYaDBaWE4wTG1SbGdodGhaRzFwYmk1aGNHa3UKYUc1cFkydGxMbUpsZUhSbGMzUXVaR1dDSEdOc2FXVnVkQzVoY0drdWFHNXBZMnRsTG1KbGVIUmxjM1F1WkdXQwpHMmh2YjJ0ekxtRndhUzVvYm1samEyVXVZbVY0ZEdWemRDNWtaWUljWW1WNExXOXpMbUZ3YVM1b2JtbGphMlV1ClltVjRkR1Z6ZEM1a1pZSVhhMkZtYTJFdWFHNXBZMnRsTG1KbGVIUmxjM1F1WkdVd0N3WURWUjBQQkFRREFnV2cKTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFCRGwxWmtUeVFnRzdDa2ZmbGlnTUFMZ3gvUFZkeC8vWWtSRk9VbQpEclRIc1BDWmE1K2RWWXM4cXhUZ1F2RDI2RCs3K1lNL1ppUWdpVHJQdmlpbjJ4aU9McVBhWVQwU2NHeTNWZDhtCi95T3drRDZPRDg2K0lmbUdpaXQwUDdlRnJ5dXZ2T3cxQWIva1pSSDJEN0pUczR2djlsM1N6TVpBMGt2alNLZHEKNzQ1aDdCdWg3c05zd2dtclpUeVB2eHFYVmMxOHk0bWRxSXZOL0R0QjcrbmNmZHhjMFBpMTdRaUFoQzVPN1hrdwpXaGF3bld5VEg0NmtkYUVFSDNaK3VjUWE2OU03WG9RQ3FMTGtpYnBYQXVEdDlOVFA5d3k3N2lram1uazJOZWFhCkhzQXYvVjZlbzhBRDM3a3pVMjVGTHJueXJDNGZXZCtzMzNkNUZsSThMb255QUVjUwotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
status:
  authorizations:
  - challenges:
    - token: d0pEer1UvIllfUEXWo8nU_lsrHJZFWOE0_LbLDGVIRY
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/wBqAM3YwQWwfuTdwLDA-NQ
    - token: lWbZUOHIm-BJnKQSC21QvhkFsO3hidJqCDVuk7aAdAI
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/BKnJ7uX9iZJ-zXy8ISld5w
    identifier: admin.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/GxucS6VHBO1381RicLzMVA
    wildcard: false
  - challenges:
    - token: vTrV0ItGeI8JHu6970wwPEFo9RqB7b6QIi41EqVEBak
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/gzAEZGspcDTCnQVzZOV6_w
    - token: 8FeE1UOHjs0oEd-M-urcAVQShY2dCay8uybFjbxYZI0
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/AWrU7OqojlKWfUjNMnVuAw
    identifier: app.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/Mi5vVYx7KCrYp678lXwOOg
    wildcard: false
  - challenges:
    - token: 2tfP7UmP5L66Ilm965xvGKi_8SJq9T8Jmd9HokVhKdE
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/3HO9RmuPpfwoNhX3cGkZ8A
    - token: Pd9btc9zFl2kbCzRzzhBcLKX3rZmgMoMBmrM1_FSPD8
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/QjZrvSCTe-afn4mNYZRmPA
    identifier: bex-os.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/-ACvd9fDwy_LynIkH_FZeA
    wildcard: false
  - challenges:
    - token: wvDWMWeMs00dMBgjMlr9c7k18RnGOyPZJhHeUAGwSxA
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/tQxqrVBkPO3HKA6JhaDeWw
    - token: pE9RsxxpFSnb8jnqVYyqwTIhqEzKRuz1TjV5AJW2mBk
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/mE5StuzrvI11sXuuoFikcA
    identifier: bom.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/j34H9Kvs1r6cZalEzAmCKg
    wildcard: false
  - challenges:
    - token: nm52ANs3BIoJTvqr7WHU27CpiPIyJY5upl8zYT3IM3k
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/v7YsPtZTmm5PbMg16iw2TA
    - token: GgmDxF4Z_XKVNXf-DQ21-gx_Rbw9Ki6hvlufOLMRI9E
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/ijD3vf92SPDAd_m3lauqNg
    identifier: client.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/UP2zcs5x-r2xmFJCeSzUBQ
    wildcard: false
  - challenges:
    - token: aAUTcmrylThy6zNEzDy5DvskZyvfyA-DLJ44m_ocUEI
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/n0pik-suLaAeFc1krF3J_A
    - token: FmAGe1IhsSQT4dHQSLJTtrkxfoxO292CLYZ8WLO9_bY
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/asba81XGsZYfoPgLv5wxlQ
    identifier: hooks.api.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/TM9EdKmmhMmUQ5oQZYGobg
    wildcard: false
  - challenges:
    - token: CfnwWt0qyG8fkuKw5fehQ1FBAskCX7vCtShA9dEhvbY
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/AQralLA_b7aWH1BiDDV-zQ
    - token: MbEl0c-qgDW9FEr7p6PQxpm5zLsKg9-dwAeGvJoTLso
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/ETd2Xa62FyQtk2VE1i5SFA
    identifier: kafka.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/IyK462TIPswkJzbs7jF8xw
    wildcard: false
  - challenges:
    - token: 3hUcuKc0D3wbqiVrr_3jdRvr48utpsPAKOH8ntaGexM
      type: http-01
      url: https://acme.zerossl.com/v2/DV90/chall/oR5HRVRAHAbnu_TfyesDnw
    - token: aKRH-qHEp9piS_j_cB2OEqQsEM_OZWd2TdCuIDvoV8M
      type: dns-01
      url: https://acme.zerossl.com/v2/DV90/chall/Zl0J3jose2lGXD4IcCkLgg
    identifier: platform.hnicke.example.com
    initialState: pending
    url: https://acme.zerossl.com/v2/DV90/authz/QpJEK_ddmhnu3TXBd7e0ww
    wildcard: false
  failureTime: "2023-03-29T13:48:53Z"
  finalizeURL: https://acme.zerossl.com/v2/DV90/order/CKKIEdMBPUQ8BKN9hB95Tw/finalize
  reason: 'Failed to retrieve Order resource: 429 : 429 Too Many Requests'
  state: errored
  url: https://acme.zerossl.com/v2/DV90/order/CKKIEdMBPUQ8BKN9hB95Tw

challenges:

apiVersion: v1
items:
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-1068479022
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360283"
    uid: 602d4ef2-952c-4877-97ed-d785fb35d629
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/TM9EdKmmhMmUQ5oQZYGobg
    dnsName: hooks.api.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: aAUTcmrylThy6zNEzDy5DvskZyvfyA-DLJ44m_ocUEI.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: aAUTcmrylThy6zNEzDy5DvskZyvfyA-DLJ44m_ocUEI
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/n0pik-suLaAeFc1krF3J_A
    wildcard: false
  status:
    presented: false
    processing: false
    reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
      Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
    state: errored
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-111491859
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360272"
    uid: a9b6dd18-8ab5-4a85-92dc-309365bdb6a8
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/j34H9Kvs1r6cZalEzAmCKg
    dnsName: bom.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: wvDWMWeMs00dMBgjMlr9c7k18RnGOyPZJhHeUAGwSxA.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: wvDWMWeMs00dMBgjMlr9c7k18RnGOyPZJhHeUAGwSxA
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/tQxqrVBkPO3HKA6JhaDeWw
    wildcard: false
  status:
    presented: false
    processing: false
    reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
      Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
    state: errored
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-1396966348
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360691"
    uid: 99041276-b7a1-40f4-8db7-6b39578ee40e
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/Mi5vVYx7KCrYp678lXwOOg
    dnsName: app.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: vTrV0ItGeI8JHu6970wwPEFo9RqB7b6QIi41EqVEBak.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: vTrV0ItGeI8JHu6970wwPEFo9RqB7b6QIi41EqVEBak
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/gzAEZGspcDTCnQVzZOV6_w
    wildcard: false
  status:
    presented: false
    processing: false
    reason: Successfully authorized domain
    state: valid
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-2081144393
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360288"
    uid: 75ed5828-cfbb-4ae1-ba71-a409101544a4
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/IyK462TIPswkJzbs7jF8xw
    dnsName: kafka.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: CfnwWt0qyG8fkuKw5fehQ1FBAskCX7vCtShA9dEhvbY.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: CfnwWt0qyG8fkuKw5fehQ1FBAskCX7vCtShA9dEhvbY
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/AQralLA_b7aWH1BiDDV-zQ
    wildcard: false
  status:
    presented: false
    processing: false
    reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
      Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
    state: errored
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-2173148795
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360271"
    uid: a00228d0-7519-4b10-9ef4-17b1aa04f81a
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/UP2zcs5x-r2xmFJCeSzUBQ
    dnsName: client.api.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: nm52ANs3BIoJTvqr7WHU27CpiPIyJY5upl8zYT3IM3k.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: nm52ANs3BIoJTvqr7WHU27CpiPIyJY5upl8zYT3IM3k
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/v7YsPtZTmm5PbMg16iw2TA
    wildcard: false
  status:
    presented: false
    processing: false
    reason: 'Failed to retrieve Order resource: 429 : 429 Too Many Requests'
    state: errored
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-3093598564
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360677"
    uid: e2cc8994-a282-4a8c-b093-91c68aba2fa6
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/-ACvd9fDwy_LynIkH_FZeA
    dnsName: bex-os.api.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: 2tfP7UmP5L66Ilm965xvGKi_8SJq9T8Jmd9HokVhKdE.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: 2tfP7UmP5L66Ilm965xvGKi_8SJq9T8Jmd9HokVhKdE
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/3HO9RmuPpfwoNhX3cGkZ8A
    wildcard: false
  status:
    presented: false
    processing: false
    reason: Successfully authorized domain
    state: valid
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-3341290473
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360306"
    uid: 2d8f804d-b176-460c-a313-cd6581d1afd6
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/QpJEK_ddmhnu3TXBd7e0ww
    dnsName: platform.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: 3hUcuKc0D3wbqiVrr_3jdRvr48utpsPAKOH8ntaGexM.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: 3hUcuKc0D3wbqiVrr_3jdRvr48utpsPAKOH8ntaGexM
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/oR5HRVRAHAbnu_TfyesDnw
    wildcard: false
  status:
    presented: false
    processing: false
    reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
      Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
    state: errored
- apiVersion: acme.cert-manager.io/v1
  kind: Challenge
  metadata:
    creationTimestamp: "2023-03-29T13:48:52Z"
    finalizers:
    - finalizer.acme.cert-manager.io
    generation: 1
    name: tls-cert-z5ggl-1698200363-503165461
    namespace: cert-test-182
    ownerReferences:
    - apiVersion: acme.cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Order
      name: tls-cert-z5ggl-1698200363
      uid: 6adebe02-7290-4b93-b5aa-8e69b17b75a9
    resourceVersion: "180360663"
    uid: 51d9f425-e480-49a6-82cc-361d205a8088
  spec:
    authorizationURL: https://acme.zerossl.com/v2/DV90/authz/GxucS6VHBO1381RicLzMVA
    dnsName: admin.api.hnicke.example.com
    issuerRef:
      group: cert-manager.io
      kind: ClusterIssuer
      name: zerossl
    key: d0pEer1UvIllfUEXWo8nU_lsrHJZFWOE0_LbLDGVIRY.rM0JAzGSRwasVEu6LLQVjGZGXO4R3A8xBHlulgd-CHo
    solver:
      http01:
        ingress:
          class: nginx
    token: d0pEer1UvIllfUEXWo8nU_lsrHJZFWOE0_LbLDGVIRY
    type: HTTP-01
    url: https://acme.zerossl.com/v2/DV90/chall/wBqAM3YwQWwfuTdwLDA-NQ
    wildcard: false
  status:
    presented: false
    processing: false
    reason: Successfully authorized domain
    state: valid
kind: List
metadata:
  resourceVersion: ""
irbekrm commented 1 year ago

Thank you for the logs and resources @hnicke .

I am taking a look at the codebase whether some of the calls could be reduced, however so far it does not look like that- the majority is to get authorization (challenge status/requirements) for each challenge and to check the challenge status.

I am wondering if it helps if you limit the number of concurrent challenges (perhaps to 1) with the --max-concurrent-challenges flag to cert-manager controller? That would effectively mean that only one challenge is processed at the time, so the calls would be spread over a longer period of time.

irbekrm commented 1 year ago

whether some of the calls could be reduced

We might be able to cut down the number of HTTP01ChallengeResponse calls to be the same as the number of required authorizations (basically the number of DNS names) for success path. I'll give that a go, that's not guaranteed to make it work with ZeroSSL though.

hnicke commented 1 year ago

Thank you for looking into the issue.

I didn't know the --max-concurrent-challenges flag existed. This sounds like it could really help.

I have deployed cert-manager with --max-concurrent-challenges=1. The first 6 challenges succeed. However, the last two fail due to 429.

At least this time, the order is not in state failed but still in pending. This makes it easier to 'reset' the challenges. I've written a workaround script which set's the status of the challenges back to pending. cert-manager then picks up processing the challenges again. In case someone has use for this:

#!/bin/bash
set -eu

failed_challenges=$(
    kubectl get challenge \
        --all-namespaces \
        -o json \
        | jq '.items
            | map(select(.status.state == "errored" and (.status.reason | contains("429"))))
            | map({ name: .metadata.name, namespace: .metadata.namespace })
            | .[]
            ' -c
    )

for challenge in $failed_challenges; do
    echo fixing up invalid challenge: $challenge
    name=$(echo "$challenge" | jq '.name' -r)
    namespace=$(echo "$challenge" | jq '.namespace' -r)
    kubectl patch challenge \
        -n "$namespace" \
        --type merge \
        --subresource status \
        --patch 'status: {state: pending}' \
        $name
done
irbekrm commented 1 year ago

https://github.com/cert-manager/cert-manager/pull/5901 should also hopefully help.

axisofentropy commented 1 year ago

@hnicke Thank you for doing all this research and contending with ZeroSSL's support. We're doing that now too.

Have you found any other providers that don't have this issue?

hnicke commented 1 year ago

That's good to hear. If more people reach out to zerossl it might make them change their minds.

I don't know of any alternative to zerossl.

In the meantime we have switched over from http challenges to dns challenges with wildcard names. In conjunction with limiting the max concurrent challenges (--max-concurrent-challenges) we are hitting the rate limit less often - more often than not issuing a certificate is successful in this configuration.

Despite the improvements, the failure rate we are seeing is still too high for the degree of automation we're striving for. Without knowing the inner workings of the cert-manager too closely I think that the best way to tackle the problem at hand is to improve the resilience of the issuing process by working on the already existing exponential retry mechanism. It would be handy to be able to configure the backoff strategy on an issuer / clusterissuer basis. With this mechanism in place, I'd be able to configure a low initial backoff value for the zerossl issuer to achieve a timely retry. This would totally fix the problem at hand.

sgsollie commented 1 year ago

Just adding a datapoint, We migrated away from ZeroSSL for this very reason. We're in google cloud, so we're now using the GCP Public CA with the ACME issuer & have had no problems since: https://cloud.google.com/certificate-manager/docs/public-ca

baszalmstra commented 1 year ago

@sgsollie Thats pretty cool! Would you be able to share how you set that up? What does your issuer configuration look like?

irbekrm commented 1 year ago

@axisofentropy I think this is a ZeroSSL specific issue. I am not aware of users who would experience this with any other ACME implementation.

It would be handy to be able to configure the backoff strategy on an issuer / clusterissuer basis.

So, there has been a discussion of lowering the initial backoff period, it still needs some use cases https://github.com/cert-manager/cert-manager/issues/4786 I don't know if just the initial backoff would solve your problem and generally we do not want to make the backoff too configurable by users- the reason being that it is also our responsibility to ensure that cert-manager installations do not hit public ACME servers too hard.

There's a couple more alternatives that need some extra thinking:

tenequm commented 1 year ago

@baszalmstra, here is an example of how it worked for me:

  1. Followed this guide to create EAB secret details through gcloud CLI locally: https://cloud.google.com/certificate-manager/docs/public-ca-tutorial
  2. Created secret resource:
    apiVersion: v1
    kind: Secret
    metadata:
    name: gcp-cm-eabsecret
    data:
    secret: {{ .Values.gcpCmEabsecret | b64enc }}
  3. Created ClusterIssuer resource:

    apiVersion: cert-manager.io/v1
    kind: ClusterIssuer
    metadata:
    name: gcp-cm
    spec:
    acme:
    # Google Certificate Manager Public CA ACME server
    server: https://dv.acme-v02.api.pki.goog/directory
    email: <your_email>
    
    # name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: gcp-cm
    
    # for each cert-manager new EAB credentials are required
    externalAccountBinding:
      keyID: <your_key_id>
      keySecretRef:
        name: gcp-cm-eabsecret
        key: secret
      keyAlgorithm: HS256
    
    # ACME DNS-01 provider configurations to verify domain
    solvers:
      - dns01:
          cloudDNS:
            project: <your_project_id>
irbekrm commented 1 year ago

We've released cert-manager v1.12.0-alpha.2 that includes the PR that reduces the number of GET calls to ACME https://github.com/cert-manager/cert-manager/pull/5901

If someone was able to test this and verify whether this (potentially in combination with limiting the number of concurrent challenges) helps for their setup that would be greatly appreciated 🙏🏼

axisofentropy commented 1 year ago

@sgsollie Thats pretty cool! Would you be able to share how you set that up? What does your issuer configuration look like?

I've just published an article showing how we set up cert-manager to use Google's Public CA. https://www.uffizzi.com/blog/ditching-zerossl-for-google-public-certificate-authority-for-ssl-certificates-via-cert-manager-and-acme-protocol

cc @baszalmstra

hnicke commented 1 year ago

@irbekrm thank you! I ran some tests this morning.

The szenario: a certificate with up to 8 dns names, using http01 challenges. In general it feels like the zerossl rate limiting is less aggressive than the last time I tried.

For both v1.11.0 and v1.12.0-alpha.2, the behavior was more or less the same: not limiting maxConccurentChallenges: hitting 429 with >=6 dns names in cert maxConccurentChallenges = 2: no 429 with 8 dns names

Compared to two weeks ago, the situation is way better. I'm not sure whether the rate limiting is more generous in general, or whether we sometimes run into their rate limits on some days and not on others, due to the apparently dynamic nature of the rate limiting algorithm.

We are now using v1.12.0-alpha.2, use maxConcurrentChallenges = 2, and switched over to DNS challenges. With this setup we don't see any 429 responses anymore.

irbekrm commented 1 year ago

Thanks for the confirmation @hnicke glad it appears to be working for now. We are at the moment looking at some more changes in relation to retry mechanism, but it will not land in v1.12.0, so hopefully folks will be able to use either of the approaches above.

I've just published an article showing how we set up cert-manager to use Google's Public CA. https://www.uffizzi.com/blog/ditching-zerossl-for-google-public-certificate-authority-for-ssl-certificates-via-cert-manager-and-acme-protocol

@axisofentropy that looks like a useful blog post for our users, if you'd like to PR it to our tutorials page I'd be happy to approve

ionosphere80 commented 1 year ago

I receive 429s from both ZeroSSL and Google's public CA service when requesting more than a couple of certificates simultaneously. In my case, Google's rate limiter seems even more restrictive than ZeroSSL. I am aware of the retry mechanism, but the same retry intervals apply to all requests, thus causing all of them to simultaneously retry and fail again. How about adding a random number of seconds between request creation and the actual request and then using random exponential retry intervals?

jahantech commented 1 year ago

I can confirm that specifying --max-concurrent-challenges=2 as an Arg solves the issue. In my case we were trying to create 12 challenges at once (issuer zerossl) cert-manager version 1.8.2

hpl002 commented 1 year ago

I can confirm that specifying --max-concurrent-challenges=2 as an Arg solves the issue. In my case we were trying to create 12 challenges at once (issuer zerossl) cert-manager version 1.8.2

Tried this with 1.10.2. Seems like it helps, but i still run into the same issue.

I even tried with concurrency disabled(--max-concurrent-challenges=1) and still got HTTP 429. The rate-limiting on their end seems very strict..

--

Same issue with 1.12.x

pmint93 commented 1 year ago

I suspect that zerossl using overall rate limiter, because sometimes I observed 429 Too Many Requests even when there are only 1-2 certificate requests. If true then it's not fair for everyone because some can request a lot of certificates while other are unable to request any

BAGELreflex commented 1 year ago

We have upgraded to v1.12.2 and have tried using --max-concurrent-challenges=1 and --max-concurrent-challenges=2. We then test deleting 10 TLS certificates and allowing cert-manager to re-issue them. When the challenges are set to 2 it works better, however we are still getting a 429 on one of the responses sometimes. This leaves the setup in an odd state:

The above conditions were true for nearly an hour.

It seems to me like the Challenge has a finalizer that hasn't exited/finished finalizing. I tried editing the Challenge and removed the finalizer, however that did not seem to do anything, so perhaps this is just an observation and not actually part of the issue. I thought it might be important due to the message "Failed to finalize Order".

I tried deleting the Order, which also deleted the Challenge, however this did not trigger a new Order to be created.

Approximately 3-5 minutes went by while I was reviewing the Certificate, and it looks like now a new Order has been generated. That order completed successfully since a 429 wasn't returned.

So it seems to me like good improvements have been made to help mitigate the likelihood of receiving a 429, and I see in this PR that attempts were made to handle the 429 condition. However it appears there is still a condition where receiving 429 can cause a certificate issuance to not fully complete or get retried automatically upon receiving a 429.

jetstack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

inteon commented 1 year ago

Looking for someone to create a PR to add ACME backoffs based on the back off header. /lifecycle frozen

rayterrill commented 5 months ago

@inteon I'm willing to take a look at this if it's still needed - we're running into this issue now. I do see https://github.com/cert-manager/cert-manager/pull/6090/files was opened at one point, but never merged. Would this be a good starting point?

inteon commented 5 months ago

@inteon I'm willing to take a look at this if it's still needed - we're running into this issue now. I do see https://github.com/cert-manager/cert-manager/pull/6090/files was opened at one point, but never merged. Would this be a good starting point?

I think we decided that we should should look for Retry-After headers and base the backoff on that, might require supporting ARI (ACME Renewal Information).

artificial-aidan commented 3 months ago

I've spent some more time playing with this, (conversation here as well) and a few things have popped up.

  1. The stdlib acme implementation retries on 429 errors, cert-manager implements its own retry. If I were to guess this was a legacy thing handling bad nonce's that is now handled in the standard library. One problem with this is, sometimes zerossl would 429 on retrieving an order, which is unrelated to challenges. I tried using the default acme library 429 handler, and that solved a decent amount of problems.....BUT

Because there is a timeout waiting for a challenge to respond, it can still fail challenges, the default timeout is 20 seconds, I tried bumping that up to 2 minutes, big improvement, still not 100%.

So instead I went a different route. Using @hnicke's script for inspiration I created a mini operator that will catch any failed challenge or order if it has 429 in the error message and reset it to pending. See here.

I would like to use this as a proof of concept of handling 429s in a different way. Possibly retrying them some amount in a default handler (zerossl doesn't respond with a Retry-After header, but others may, and the default ACME handler handles this), but if it continues to 429, then just mark it as pending again, which will bump it back into the processing queue, and it will get tried again. This way we don't have timeouts, and it gets handled at a controller level.

I'm going to run the mini operator on my cluster for a while and see how things go, if other people could test it too that would be great.