go-acme / lego

Let's Encrypt/ACME client and library written in Go
https://go-acme.github.io/lego/
MIT License
7.58k stars 994 forks source link

route53: aws-sdk-go-v2 broke IAM instance role #2033

Closed nickjmv closed 7 months ago

nickjmv commented 9 months ago

Welcome

What did you expect to see?

A certificate is generated by using the AWS EC2 instance profile role.

What did you see instead?

An error message about the AWS EC2 IMDS.

How do you use lego?

Docker image

Reproduction steps

Renew an existing certificate by letting the docker image by making use of the instance profile of the AWS EC2 machine.

It works when using role assumption by passing a profile other than 'default' to the docker image. But using the attached instances profile role the error is generated. Another fix is using lego v4.13.2 which is still using the old AWS sdk.

Version of lego

v4.14.2

Logs

```console 2023/10/12 09:43:05 [INFO] [xxx.sub.domain.com] acme: Trying renewal with -6 hours remaining 2023/10/12 09:43:05 [INFO] renewal: random delay of 1m31.195523715s 2023/10/12 09:44:36 [INFO] [xxx.sub.domain.com] acme: Obtaining bundled SAN certificate 2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/273116278446 2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Could not find solver for: tls-alpn-01 2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Could not find solver for: http-01 2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: use dns-01 solver 2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Preparing to solve DNS-01 2023/10/12 09:44:42 [INFO] [xxx.sub.domain.com] acme: Cleaning DNS-01 challenge 2023/10/12 09:44:43 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/273116278446 2023/10/12 09:44:43 error: one or more domains had a problem: [xxx.sub.domain.com] [xxx.sub.domain.com] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded ```

Go environment (if applicable)

```console $ go version && go env # paste output here ```
ldez commented 9 months ago

Hello,

I think this is an internal change in the SDK.

acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded

The error comes from here.

I'm not a specialist in AWS, and the SDK migration guide is really weak.

I don't know if it's an expected behavior for the new SDK, a bug of the SDK, or something else.

nickjmv commented 9 months ago

I read on the AWS documentation that IMDs v1 and v2 should both work. So I'm kind of puzzled about why we are receiving the error.

Will you do some extra testing on this? Or what actions do you see next? I assume there are multiple users that encounter this.

ldez commented 9 months ago

I assume there are multiple users that encounter this.

As you can see it seems you are alone with this problem (no thumbs up, no other report)

what actions do you see next?

I don't know because based on the code I have no idea of the real root of the problem.

triplepoint commented 8 months ago

FWIW, I got here by discovering that my Traefik Let's Encrypt configuration, which had been running fine, has apparently picked up this same problem after upgrading to the lastest stableTraefik container tag 2.10.5, from 2.10.4. The initial error in the logs was that aws region was a required value. I provided the AWS_REGION environment variable through the docker compose file, and now the error I see is:

traefik | time="2023-11-12T00:19:12Z" level=error msg="Error renewing certificate from LE: {redacted.com []}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" error="error: one or more domains had a problem:\n[redacted.com] [redacted.com] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded\n" providerName=letsencrypt.acme

None of my AWS IAM policies have changed, and this machine has been running untouched for years. The only difference is the bugfix version of the Traefik container being revved, which came with a rev of this lego lib.

ldez commented 7 months ago

https://github.com/go-acme/lego/issues/2067#issuecomment-1845722213