golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.27k stars 17.7k forks source link

x/crypto/acme: ACME client's internal retry implementation results in hanging retries on 429s #40376

Open viola opened 4 years ago

viola commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/viola/go/bin"
GOCACHE="/Users/viola/Library/Caches/go-build"
GOENV="/Users/viola/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/viola/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/viola/crypto/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0s/nyt41p_j69d8vzq1qfkg5nrh0000gn/T/go-build384678003=/tmp/go-build -gno-record-gcc-switches -fno-common"
### What did you do? Start off with triggering a rate-limit, to get a 429 from Let's Encrypt. Here's how I triggered one by going over [the default 5 per week duplicate certificate limit set by Let's Encrypt](https://letsencrypt.org/docs/rate-limits/) : * Call `AuthorizeOrder` with a valid AuthzID identifier. * Repeat the request 6 times. * On the 6th attempt, the request winds up hanging with a much longer execution. It wound up hitting a timeout set up an upstream caller at 45 seconds. * Unless [the context is canceled or timed-out](https://github.com/golang/crypto/blob/4663e185863a1aee50d0486b326769f0bd22eb30/acme/http.go#L127-L128), or a ["non-retriable error status"](https://github.com/golang/crypto/blob/4663e185863a1aee50d0486b326769f0bd22eb30/acme/http.go#L299-L303) is received, retries are indefinite. In this case, the client never returns, because 429 is a retriable error code, and this will be a somewhat terminal state (for the rest of the week). * Here's a look at the 429 that the client was receiving and retrying on in this example. ``` 429 urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates already issued for exact set of domains: violababola.best: see https://letsencrypt.org/docs/rate-limits/ ``` ### What did you expect to see? * Fail fast and not to retry on the [client side error](https://github.com/golang/crypto/blob/master/acme/http.go#L302) => `http.StatusTooManyRequests` since this is a not recoverable response error code. At least, the facility to configure this. * Also, that there is some bound on the number of retries. ### What did you see instead? Retries on the [client side error](https://github.com/golang/crypto/blob/master/acme/http.go#L302) => `http.StatusTooManyRequests` It seems like `http.StatusTooManyRequests` is something that should not be considered as a retriable response error code as it's not recoverable. ### Related PR Fix * This [PR](https://github.com/golang/crypto/pull/149) introduces a custom `ShouldRetry` func option that can be set on the ACME client to allow the default set of retriable response error codes to be overridden. This will keep the current behaviour backwards compatible, but provide more flexible retry configuration.
cagedmantis commented 4 years ago

/cc @FiloSottile @x1ddos

cagedmantis commented 4 years ago

Hi @viola! Thank you for contributing to the Go project and welcome.

viola commented 4 years ago

Hi @cagedmantis, good talking to you again! Please let me know if there is anything else I can elaborate on to help this issue move forward. PS. Go cubs go!

icholy commented 4 years ago

Related issue #40161

andrewloux commented 4 years ago

cc @FiloSottile 👋 Just doing a gentle nudge on this one 👀

andrewloux commented 4 years ago

Hello folks and apologies for the ping @FiloSottile @x1ddos. I just wanted to check in on this and see if we could move it forwards? The PR that fixes this is ready for review: https://github.com/golang/crypto/pull/149

Please do let me know if there's anything I can do to help trudge this along.

alicethorne-ab commented 4 years ago

Hi all. I'm a developer at @1password and we're currently experiencing this exact issue with the library. It'd be a great help if we could get golang/crypto#149 moved along and merged before the upcoming code freeze. Please let me know if any testing is needed to assist with that.

andrewloux commented 4 years ago

cc @FiloSottile if you have some time, would love to pick this one up. Especially since there is more interest now ☝️

ZhiminXiang commented 4 years ago

I also hit this issue. Could the fix be prioritized? :)

viola commented 4 years ago

@icholy @alicethorne-ab @ZhiminXiang thanks for letting me know you've hit the same issue. This farther validates that our golang/crypto#149 fix would be really nice to bring over to crypto. @FiloSottile @x1ddos @cagedmantis folks any eyes on that PR would be greatly appreciated! Please let me know if there is anything I can do to help. ❤️

ghost commented 4 years ago

I would love to see this fixed since 429 is being returned in a case where you will need to retry for days before it would succeed. That doesn't seem to match the purpose of 429.

rolandshoemaker commented 4 years ago

Hey @viola, sorry it has taken so long to address this. I agree this isn't the correct behavior, as far as I am aware there is only one class of 429 returned from most ACME servers that is likely to be retry-able in the short term (an overall req/s limit). Rather than expanding the API surface of the client I think it makes sense to just remove the behavior of retrying on 429 responses in general.

gopherbot commented 4 years ago

Change https://golang.org/cl/272927 mentions this issue: x/crypto/acme: only retry GET requests on 429

viola commented 4 years ago

@rolandshoemaker thank you so much for looking into it for me! While https://golang.org/cl/272927 will address my original issue with retrying on 429 in AuthorizeOrder that uses post I still think there is a value in my proposed extension for the api. Mainly ability for the caller to control what to retry on if desired or needed. Given that https://github.com/golang/crypto/pull/149 is not a breaking change would you still consider it? Once again, thank you for taking the time to review my issue and PR! <3

rolandshoemaker commented 4 years ago

Typically when we extend the public API of a package we go through a proposal process before moving to a CL as once the API is landed it becomes much harder to make any changes (although less so for x/ packages, it is still something we want to follow). If you could write up the proposed API for this change in this issue it would help clear up a few things. It doesn't need to be super detailed, just what the public methods would look like, and what the use case is.

In particular I'd be interested in what you see the use cases for this API would be, beyond the initial 429 one.