go-acme / lego

Let's Encrypt/ACME client and library written in Go
https://go-acme.github.io/lego/
MIT License
7.58k stars 994 forks source link

Failing to issue a wildcard certificate when using OTC DNS provider #2021

Closed volayvaz closed 10 months ago

volayvaz commented 10 months ago

Welcome

What did you expect to see?

I expect corresponding DNS records to be created, certificates issued, and DNS records to be deleted.

2023/09/20 18:09:09 [INFO] [*.test.example.com, test.example.com] acme: Obtaining bundled SAN certificate
2023/09/20 18:09:11 [INFO] [*.test.example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403288954
2023/09/20 18:09:11 [INFO] [test.example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403288964
2023/09/20 18:09:11 [INFO] [*.test.example.com] acme: use dns-01 solver
2023/09/20 18:09:11 [INFO] [test.example.com] acme: Could not find solver for: tls-alpn-01
2023/09/20 18:09:11 [INFO] [test.example.com] acme: Could not find solver for: http-01
2023/09/20 18:09:11 [INFO] [test.example.com] acme: use dns-01 solver
2023/09/20 18:09:11 [INFO] [*.test.example.com] acme: Preparing to solve DNS-01
2023/09/20 18:09:12 [INFO] [test.example.com] acme: Preparing to solve DNS-01
2023/09/20 18:09:13 [INFO] [*.test.example.com] acme: Trying to solve DNS-01
2023/09/20 18:09:13 [INFO] [*.test.example.com] acme: Checking DNS record propagation using [ns1.open-telekom-cloud.com:53]
2023/09/20 18:09:15 [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]
2023/09/20 18:09:16 [INFO] [*.test.example.com] acme: Waiting for DNS record propagation.
2023/09/20 18:09:22 [INFO] [*.test.example.com] The server validated our request
2023/09/20 18:09:22 [INFO] [test.example.com] acme: Trying to solve DNS-01
2023/09/20 18:09:22 [INFO] [test.example.com] acme: Checking DNS record propagation using [ns1.open-telekom-cloud.com:53]
2023/09/20 18:09:24 [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]
2023/09/20 18:09:33 [INFO] [test.example.com] The server validated our request
2023/09/20 18:09:33 [INFO] [*.test.example.com] acme: Cleaning DNS-01 challenge
2023/09/20 18:09:34 [INFO] [test.example.com] acme: Cleaning DNS-01 challenge
2023/09/20 18:09:35 [INFO] [*.test.example.com, test.example.com] acme: Validations succeeded; requesting certificates
2023/09/20 18:09:35 [INFO] Wait for certificate [timeout: 30s, interval: 500ms]
2023/09/20 18:09:36 [INFO] [*.test.example.com] Server responded with a certificate.

What did you see instead?

Process finishes with error

2023/09/20 18:57:38 [INFO] [*.test.example.com, test.example.com] acme: Obtaining bundled SAN certificate
2023/09/20 18:57:39 [INFO] [*.test.example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883584
2023/09/20 18:57:39 [INFO] [test.example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883594
2023/09/20 18:57:39 [INFO] [*.test.example.com] acme: use dns-01 solver
2023/09/20 18:57:39 [INFO] [test.example.com] acme: Could not find solver for: tls-alpn-01
2023/09/20 18:57:39 [INFO] [test.example.com] acme: Could not find solver for: http-01
2023/09/20 18:57:39 [INFO] [test.example.com] acme: use dns-01 solver
2023/09/20 18:57:39 [INFO] [*.test.example.com] acme: Preparing to solve DNS-01
2023/09/20 18:57:40 [INFO] [test.example.com] acme: Preparing to solve DNS-01
2023/09/20 18:57:41 [INFO] [*.test.example.com] acme: Trying to solve DNS-01
2023/09/20 18:57:41 [INFO] [*.test.example.com] acme: Checking DNS record propagation using [ns1.open-telekom-cloud.com:53]
2023/09/20 18:57:43 [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]
2023/09/20 18:57:43 [INFO] [*.test.example.com] acme: Waiting for DNS record propagation.
2023/09/20 18:57:45 [INFO] [*.test.example.com] acme: Waiting for DNS record propagation.
2023/09/20 18:57:52 [INFO] [*.test.example.com] The server validated our request
2023/09/20 18:57:52 [INFO] [*.test.example.com] acme: Cleaning DNS-01 challenge
2023/09/20 18:57:54 [INFO] [test.example.com] acme: Cleaning DNS-01 challenge
2023/09/20 18:57:55 [WARN] [test.example.com] acme: cleaning up failed: otc: unable to get record _acme-challenge.test.example.com. for zone test.example.com: record not found 
2023/09/20 18:57:55 [INFO] Skipping deactivating of valid auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883584
2023/09/20 18:57:55 [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883594
2023/09/20 18:57:55 Could not obtain certificates:
    error: one or more domains had a problem:
[test.example.com] [test.example.com] acme: error presenting token: otc: unexpected status code: [status code: 400] body: {"code":"DNS.0312","message":"Attribute 'name' conflicts with Record Set '_acme-challenge.test.example.com.' type 'TXT' in line 'default_view'."}
Process 66234 has exited with status 1

How do you use lego?

Binary

Reproduction steps

  1. Set ENV vars:
    "OTC_IDENTITY_ENDPOINT": "https://iam.eu-de.otc.t-systems.com/v3/auth/tokens"
    "OTC_DOMAIN_NAME": “[OTC_DOMAIN]”
    "OTC_USER_NAME": “[USER_NAME]”
    "OTC_PASSWORD": “[PASSWORD]”
    "OTC_PROJECT_NAME": "eu-de"
    "OTC_PROPAGATION_TIMEOUT": "120"
    "OTC_TTL": "300"
  2. Run lego:
    lego -a \ 
    --dns.resolvers=ns1.open-telekom-cloud.com \ 
    --server=https://acme-staging-v02.api.letsencrypt.org/directory \  
    --email=root@example.com \  
    --domains=*.test.example.com \ 
    --domains=test.example.com \ 
    --dns=otc \  
    run

Version of lego

lego version v4.14.2 darwin/arm64

Logs

```console 2023/09/20 18:57:55 [WARN] [test.example.com] acme: cleaning up failed: otc: unable to get record _acme-challenge.test.example.com. for zone test.example.com: record not found 2023/09/20 18:57:55 [INFO] Skipping deactivating of valid auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883584 2023/09/20 18:57:55 [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8403883594 2023/09/20 18:57:55 Could not obtain certificates: error: one or more domains had a problem: [test.example.com] [test.example.com] acme: error presenting token: otc: unexpected status code: [status code: 400] body: {"code":"DNS.0312","message":"Attribute 'name' conflicts with Record Set '_acme-challenge.test.example.com.' type 'TXT' in line 'default_view'."} Process 66234 has exited with status 1 ```

Go environment (if applicable)

```console $ go version && go env # paste output here ```
volayvaz commented 10 months ago

I've done some research on the issue.

OTC doesn't support creating multiple recordsets with the same name. But if we need to issue a certificate for multiple domains, we must create multiple _acme_challenage.example.com recordsets with different tokens for each domain. Lego calls out the Present method for every domain it challenges. And in the end, it also calls Cleanup for each domain. Both functions are failing with an error, since, on the second run, Present is trying to create a recordset with the same name as during the previous run, and Cleanup is trying to delete a non-existing recordset.

I managed to fix the issue locally, and I'm testing my own build now, so far I haven't found any issues. Will be happy to share my fix with the community.

ldez commented 10 months ago

Hello,

OTC doesn't support creating multiple recordsets with the same name.

it's surprising because this provider has been around for a long time.

We have 2 modes for DNS provider: sequential and "parallel". The mode cannot be changed it's inside the implementation.

The default is "parallel".

To change this behavior, you just have to add this: https://github.com/go-acme/lego/blob/113648a36817a5d9667b8dd1f1c6c67eeb914916/providers/dns/duckdns/duckdns.go#L107-L111

volayvaz commented 10 months ago

Good morning!

it's surprising because this provider has been around for a long time.

Yeah, I guess it's not popped up earlier since the OTC provider is not as widely used as others. Thank you for pointing me out on the sequential mode. I didn't know that, and the fix I made was solely focused on making the provider compatible with the default operating mode. I'll try the sequential mode and post here if it works or not!

Thank you!

volayvaz commented 10 months ago

@ldez Yes, the sequential mode works. The only drawback of it is that obtaining certificates takes more time than parallel. In my case, the minimum SEQUENCE_INTERVAL that worked is 60 secs, so around three minutes overall for a certificate with two domains. Should I create a PR?

ldez commented 10 months ago

Have you try a shorter interval?

Yes you can open a PR.

volayvaz commented 10 months ago

Have you try a shorter interval?

Yes, I tested several times with 30 and 45-second intervals, and all tests failed.

2023/09/21 14:10:13 [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]
2023/09/21 14:10:13 [INFO] [*.test.example.com] acme: Waiting for DNS record propagation.
2023/09/21 14:10:23 [INFO] [*.test.example.com] The server validated our request
2023/09/21 14:10:23 [INFO] [*.test.example.com] acme: Cleaning DNS-01 challenge
2023/09/21 14:10:24 [INFO] sequence: wait for 45s
2023/09/21 14:11:09 [INFO] [test.example.com] acme: Preparing to solve DNS-01
2023/09/21 14:11:10 [INFO] [test.example.com] acme: Trying to solve DNS-01
2023/09/21 14:11:10 [INFO] [test.example.com] acme: Checking DNS record propagation using [ns1.open-telekom-cloud.com:53]
2023/09/21 14:11:12 [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]
2023/09/21 14:11:12 [INFO] [test.example.com] acme: Waiting for DNS record propagation.
2023/09/21 14:11:22 [INFO] [test.example.com] acme: Cleaning DNS-01 challenge
2023/09/21 14:11:24 [INFO] Skipping deactivating of valid auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8419197944
2023/09/21 14:11:24 [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/8419197954
2023/09/21 14:11:24 Could not obtain certificates:
    error: one or more domains had a problem:
[test.example.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: During secondary validation: Incorrect TXT record "VYtdH8nOdfHHbXgTqlDAtOwMPFfY-z72iCLJqyh4lDg" found at _acme-challenge.test.example.com

During the test, I also monitored DNS propagation with the dig tool dig @ns1.open-telekom-cloud.com _acme-challenge.test.example.com txt and upon the second DNS name validation, I could see that a new record with a new token was propagated. Still, lego fails with an Incorrect TXT record and reveals the previous token from the first challenge. It seems to me like a caching issue, but I haven't embarked on that.

So far, the lowest interval that gives consistent success results is 60 sec.

Yes you can open a PR.

On it