go-acme / lego

Let's Encrypt/ACME client and library written in Go
https://go-acme.github.io/lego/
MIT License
7.84k stars 1.01k forks source link

5 second delays between solutions -- intentional? #1208

Closed pmkane closed 1 year ago

pmkane commented 4 years ago

Hey all:

Currently, challenge/resolver/solver_manager.go imposes a minimum 5 second sleep between successful http-01 solutions, even if the previous solution was successful. As the comment in the code alludes to, this is because Let's Encrypt/Boudler doesn't return at Retry-After, so it defaults to the hard-coded 5 second RA.

It appears that the actual rate-limit at Let's Encrypt for requests to /acme/* is 40 requests per second.

When creating new certificates with many hostnames, this delay drags things down.

Is this an intentional decision or an unintentional side effect? If it's the latter, happy to submit a PR to be less conservative.

ldez commented 4 years ago

Hello,

The Retry-After value (5s) it is only used if the challenge does not succeed the first time.

There is no minimum sleep between 2 HTTP-01 challenges.

https://github.com/go-acme/lego/blob/deb0ffc5d9310b08a7179c61062bec99de791745/challenge/resolver/solver_manager.go#L95-L108

the value (5s), before the first retry, is not a random value: https://tools.ietf.org/html/rfc8555#section-8.2

pmkane commented 4 years ago

Hey there:

Unfortunately, there is most definitely currently a 5 second delay between successful http-01 challenges through that code path. If you set ra = 1000 and try to issue a certificate with more than one hostname and you will see it quite clearly.

The reason is that we are setting bo.InitialInterval to 5 in line 106, so we will always sleep for 5 seconds.

Good to know that in the failure case that 5 seconds is specified.

ldez commented 4 years ago

the InitialInterval is the Initial Interval before the first retry, neither between challenges nor before the challenge.

func TestExample(t *testing.T) {
    initialInterval := 5 * time.Second

    bo := backoff.NewExponentialBackOff()
    bo.InitialInterval = initialInterval
    bo.MaxInterval = 10 * initialInterval
    bo.MaxElapsedTime = 100 * initialInterval

    operation := func() error {
        log.Println("op")
        return nil
    }
    log.Println("be")
    err := backoff.Retry(operation, bo)
    require.NoError(t, err)
    log.Println("end")
}
2020/06/30 23:23:45 be
2020/06/30 23:23:45 op
2020/06/30 23:23:45 end
pmkane commented 4 years ago

Thanks for taking the time to put together a test case. Unfortunately, there's definitely something different about your test case compared to how lego is actually using Backoff. I've applied this diff to lego checked out from master:

diff --git a/challenge/resolver/solver_manager.go b/challenge/resolver/solver_manager.go
index bd79a4f..1d43787 100644
--- a/challenge/resolver/solver_manager.go
+++ b/challenge/resolver/solver_manager.go
@@ -98,7 +98,7 @@ func validate(core *api.Core, domain string, chlg acme.Challenge) error {
                // If it doesn't, we'll just poll hard.
                // Boulder does not implement the ability to retry challenges or the Retry-After header.
                // https://github.com/letsencrypt/boulder/blob/master/docs/acme-divergences.md#section-82
-               ra = 5
+               ra = 30
        }
        initialInterval := time.Duration(ra) * time.Second

I then run:

dist/lego -a --http.memcached-host [redacted] --http --email [redacted] --key-type rsa4096 --path /tmp/certs -server=https://acme-staging-v02.api.letsencrypt.org/directory -d [redacted] -d [redacted] -d [redacted] --http run

and as you can see from the output, there is a 30 second sleep between solutions:

2020/06/30 21:50:31 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:50:58 [INFO] [redacted] The server validated our request
2020/06/30 21:50:58 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:51:18 [INFO] [redacted] The server validated our request
2020/06/30 21:51:18 [INFO] [redacted] acme: Trying to solve HTTP-01
2020/06/30 21:51:49 [INFO] [redacted] The server validated our request
2020/06/30 21:51:49 [INFO] [redacted] acme: Validations succeeded; requesting certificates
2020/06/30 21:51:52 [INFO] [redacted] Server responded with a certificate.

If you bump ra up to a very high value, you will be able to see the same behavior.

ldez commented 4 years ago
lego -m test@example.com -a \
-d acme.wtf -d acme.lego.wtf -d lego.wtf \
--http \
run
2020/07/01 00:07:36 [INFO] [acme.wtf, acme.lego.wtf, lego.wtf] acme: Obtaining bundled SAN certificate
2020/07/01 00:07:36 [INFO] [lego.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [acme.wtf] acme: use http-01 solver
2020/07/01 00:07:36 [INFO] [lego.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [lego.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [acme.lego.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.wtf] acme: Trying to solve HTTP-01
2020/07/01 00:07:36 [INFO] [acme.wtf] The server validated our request
2020/07/01 00:07:36 [INFO] [acme.wtf, acme.lego.wtf, lego.wtf] acme: Validations succeeded; requesting certificates
2020/07/01 00:07:36 [INFO] Wait for certificate [timeout: 30s, interval: 500ms]
2020/07/01 00:07:36 [INFO] [acme.wtf] Server responded with a certificate.