Closed pmkane closed 1 year ago
Hello,
The Retry-After value (5s) it is only used if the challenge does not succeed the first time.
There is no minimum sleep between 2 HTTP-01 challenges.
the value (5s), before the first retry, is not a random value: https://tools.ietf.org/html/rfc8555#section-8.2
Hey there:
Unfortunately, there is most definitely currently a 5 second delay between successful http-01 challenges through that code path. If you set ra = 1000 and try to issue a certificate with more than one hostname and you will see it quite clearly.
The reason is that we are setting bo.InitialInterval to 5 in line 106, so we will always sleep for 5 seconds.
Good to know that in the failure case that 5 seconds is specified.
the InitialInterval
is the Initial Interval before the first retry, neither between challenges nor before the challenge.
func TestExample(t *testing.T) {
initialInterval := 5 * time.Second
bo := backoff.NewExponentialBackOff()
bo.InitialInterval = initialInterval
bo.MaxInterval = 10 * initialInterval
bo.MaxElapsedTime = 100 * initialInterval
operation := func() error {
log.Println("op")
return nil
}
log.Println("be")
err := backoff.Retry(operation, bo)
require.NoError(t, err)
log.Println("end")
}
2020/06/30 23:23:45 be
2020/06/30 23:23:45 op
2020/06/30 23:23:45 end
Thanks for taking the time to put together a test case. Unfortunately, there's definitely something different about your test case compared to how lego is actually using Backoff. I've applied this diff to lego checked out from master:
diff --git a/challenge/resolver/solver_manager.go b/challenge/resolver/solver_manager.go
index bd79a4f..1d43787 100644
--- a/challenge/resolver/solver_manager.go
+++ b/challenge/resolver/solver_manager.go
@@ -98,7 +98,7 @@ func validate(core *api.Core, domain string, chlg acme.Challenge) error {
// If it doesn't, we'll just poll hard.
// Boulder does not implement the ability to retry challenges or the Retry-After header.
// https://github.com/letsencrypt/boulder/blob/master/docs/acme-divergences.md#section-82
- ra = 5
+ ra = 30
}
initialInterval := time.Duration(ra) * time.Second
I then run:
dist/lego -a --http.memcached-host [redacted] --http --email [redacted] --key-type rsa4096 --path /tmp/certs -server=https://acme-staging-v02.api.letsencrypt.org/directory -d [redacted] -d [redacted] -d [redacted] --http run
and as you can see from the output, there is a 30 second sleep between solutions:
2020/06/30 21:50:31 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:50:58 [INFO] [redacted] The server validated our request
2020/06/30 21:50:58 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:51:18 [INFO] [redacted] The server validated our request
2020/06/30 21:51:18 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:51:49 [INFO] [redacted] The server validated our request
2020/06/30 21:51:49 [INFO] [redacted] acme: Validations succeeded; requesting certificates
2020/06/30 21:51:52 [INFO] [redacted] Server responded with a certificate.
If you bump ra up to a very high value, you will be able to see the same behavior.
lego -m test@example.com -a \
-d acme.wtf -d acme.lego.wtf -d lego.wtf \
--http \
run
2020/07/01 00:07:36 [INFO] [acme.wtf, acme.lego.wtf, lego.wtf] acme: Obtaining bundled SAN certificate
2020/07/01 00:07:36 [INFO] [lego.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [acme.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [lego.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [lego.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [acme.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.wtf, acme.lego.wtf, lego.wtf] acme: Validations succeeded; requesting certificates
2020/07/01 00:07:36 [INFO] Wait for certificate [timeout: 30s, interval: 500ms]
2020/07/01 00:07:36 [INFO] [acme.wtf] Server responded with a certificate.
Hey all:
Currently, challenge/resolver/solver_manager.go imposes a minimum 5 second sleep between successful http-01 solutions, even if the previous solution was successful. As the comment in the code alludes to, this is because Let's Encrypt/Boudler doesn't return at Retry-After, so it defaults to the hard-coded 5 second RA.
It appears that the actual rate-limit at Let's Encrypt for requests to /acme/* is 40 requests per second.
When creating new certificates with many hostnames, this delay drags things down.
Is this an intentional decision or an unintentional side effect? If it's the latter, happy to submit a PR to be less conservative.