caddy-dns / route53

Caddy module: dns.providers.route53
MIT License
38 stars 31 forks source link

Unable to pass delegated DNS challenge when using caddy dns_challenge_override_domain #24

Open davidkinde opened 1 year ago

davidkinde commented 1 year ago

I am onboarding clients with their custom domains, with the aim of creating a certificate for manage.clientsite.com and a wildcard certificate for *.manage.clientsite.com. To do this the client sets a CNAME record to delegate the DNS challenge as follows: _acme-challenge.manage.clientsite.com > _acme-challenge.appsite.com (also tried _acme-challenge.manage.appsite.com)

This is the caddy config:

`{ acme_ca https://acme-staging-v02.api.letsencrypt.org/directory }

*.manage.clientsite.com, manage.clientsite.com { reverse_proxy 127.0.0.1:8090

tls {
    dns route53 {
        aws_profile "my_profile"
        max_retries 1
    }
    dns_challenge_override_domain appsite.com
}

} `

The result is a clean up error, and an error suggesting the cname record already exists (it does not exist before or after this process).

2022/08/08 10:45:13.352 ERROR tls.issuance.acme.acme_client cleaning up solver {"identifier": "manage.clientsite.com", "challenge_type": "dns-01", "error": "no memory of presenting a DNS record for manage.clientsite.com (probably OK if presenting failed)"} 2022/08/08 10:45:13.505 ERROR tls.obtain could not get certificate from issuer {"identifier": "manage.clientsite.com", "issuer": "acme-staging-v02.api.letsencrypt.org-directory", "error": "[manage.clientsite.com] solving challenges: presenting for challenge: adding temporary record for zone appsite.com.: InvalidChangeBatch: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: 582cde5f-a271-487f-92a0-123, InvalidChangeBatch: [Tried to create resource record set [name='appsite.com.', type='TXT'] but it already exists] (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/123/123) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)"}

If I set the dns_challenge_override_domain to _acme-challenge.appsite.com (I've been advised it shouldn't need to be this), then it works as far as the certificates do get generated, but then it fails when it tries to clean up the acme challenge dns record.

2022/08/08 09:41:43.159 ERROR tls.issuance.acme.acme_client cleaning up solver {"identifier": "*.manage.clientsite.com", "challenge_type": "dns-01"}

I haven't found any combination of configuration settings that will allow the dns challenge to be delegated to another domain, and for it to create and delete the dns challenge record.

UPDATE: I have also tested this with a simpler config, just using the manage.clientsite.com and not trying to create the wildcard certificate as well, the result is the same.

mholt commented 1 year ago

(as noted in the forum, the cleanup bug is fixed in https://github.com/caddyserver/certmagic/commit/23ca487b74f7b42a414d0442c2f57d95ab90e0a5)

devjack commented 1 year ago

I'm experiencing the same error w/ route53 with a similar setup. Given the details below I suspect it's correctly checking the Override domain via dns_challenge_override_domain to the certificate order, but its not able to handle concurrent updates to the TXT record (I think).

My ideal outcome is for Caddy to serve TLS in two scenarios:

To achieve this I have 2x site blocks each with different TLS directives. I get the same error(s) that @davidkinde reported, one each for SAN in the DNS-01 challenge. In debugging this I also receive corresponding errors when I reduce the site block to just the base foo.example.app hostname (and no wildcard SAN, still using Route53).

At present I have two site blocks with different TLS directives.

simplified Caddyfile ``` https://*.{$PUBLIC_HOSTNAME} https://{$PUBLIC_HOSTNAME} { tls { dns route53 dns_challenge_override_domain {$PUBLIC_HOSTNAME} resolvers 8.8.8.8 8.8.4.4 } # Customer hostnames, assumes CNAME to a *.foo.example.app https:// { tls { on_demand } #removed: several other handlers + reverse proxy } ```
error for foo.example.app `{"level":"error","ts":1666097175.1712997,"logger":"tls.obtain","msg":"will retry","error":"[foo.example.app] Obtain: [foo.example.app] solving challenges: presenting for challenge: adding temporary record for zone \"foo.example.app.\": InvalidChangeBatch: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: 5502b554-cf50-4757-8989-89f78ab62283, InvalidChangeBatch: [Tried to create resource record set [name='foo.example.app.', type='TXT'] but it already exists] (order=https://acme.zerossl.com/v2/DV90/order/REDACTED) (ca=https://acme.zerossl.com/v2/DV90)","attempt":4,"retrying_in":300,"elapsed":372.282702709,"max_duration":2592000}`
error for *.foo.example.app `{"level":"error","ts":1666097181.9009788,"logger":"tls.obtain","msg":"will retry","error":"[*.foo.example.app] Obtain: [*.foo.example.app] solving challenges: presenting for challenge: adding temporary record for zone \"foo.example.app.\": InvalidChangeBatch: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: 5f66d88c-06cf-487e-bfdf-e69684021cc9, InvalidChangeBatch: [Tried to create resource record set [name='foo.example.app.', type='TXT'] but it already exists] (order=https://acme.zerossl.com/v2/DV90/order/REDACTED) (ca=https://acme.zerossl.com/v2/DV90)","attempt":4,"retrying_in":300,"elapsed":378.985750075,"max_duration":2592000}`

I note they're for the same REDACTED Order ID, but these logs land 7 seconds apart so I assume its doing challenges sequentially?

devjack commented 1 year ago

I'm rusty with Go but happy to jump in and attempt a fix. @mholt are you able to clarify what direction you envision this taking to use the certmagic update I'm happy to give it a crack. I didn't fully understand the full impact/context of the cleanup bug to run with it from here.

kinde-engineering commented 1 year ago

@devjack I narrowed the issue down to the fact that route53 was wrapping the challenge in quotes, I managed to fix the issue(ish) with the following: https://github.com/caddyserver/certmagic/compare/master...kinde-engineering:certmagic:master Only problem was it was taking two passes to clean up, and really the issue is with route53 rather than cert.

There's since been a fix for this by the route53 contributors, you can read the full details here: https://caddy.community/t/error-cleaning-up-dns-challenge-solver-with-certmagic-fix/17025 and there's a fix here: https://github.com/libdns/route53/pull/18