adferrand / dnsrobocert

Orchestrate Certbot and Lexicon together to provide Let's Encrypt TLS certificates validated by DNS challenges
https://dnsrobocert.readthedocs.io
MIT License
550 stars 88 forks source link

DNS operation timeout with DNSroboCert 3.25.0 #1101

Open Vertganti opened 10 months ago

Vertganti commented 10 months ago

Using a fresh setup of DNSroboCert 3.25.0 on a new host resulted in a DNS operation timeout error. Expand the following block for the full log:

Log ``` dnsrobocert_1 | 2023-11-29 09:24:00 a7a1181186b5 dnsrobocert.core.main[1] INFO Creating missing certificates if needed (~1min for each) dnsrobocert_1 | 2023-11-29 09:24:01 a7a1181186b5 dnsrobocert.core.certbot[1] INFO Handling the certificate for domain(s): sub.domain.example dnsrobocert_1 | ---------- dnsrobocert_1 | 2023-11-29 09:24:01 a7a1181186b5 dnsrobocert.core.utils[1] INFO Launching command: /usr/local/bin/python3 -m dnsrobocert.core.certbot certonly -n --user-agent-comment DNSroboCert/3.25.0 --preferred-chain "ISRG Root X1" --config-dir /etc/letsencrypt --work-dir /etc/letsencrypt/workdir --logs-dir /etc/letsencrypt/logs --manual --preferred-challenges=dns --manual-auth-hook "/usr/local/bin/python3 -m dnsrobocert.core.hooks -t auth -c \"/tmp/tmptklzdm7l/dnsrobocert-runtime.yml\" -l \"sub.domain.example\"" --manual-cleanup-hook "/usr/local/bin/python3 -m dnsrobocert.core.hooks -t cleanup -c \"/tmp/tmptklzdm7l/dnsrobocert-runtime.yml\" -l \"sub.domain.example\"" --expand --deploy-hook "/usr/local/bin/python3 -m dnsrobocert.core.hooks -t deploy -c \"/tmp/tmptklzdm7l/dnsrobocert-runtime.yml\" -l \"sub.domain.example\"" --server https://acme-v02.api.letsencrypt.org/directory --cert-name sub.domain.example --force-renew --key-type rsa -d sub.domain.example dnsrobocert_1 | Saving debug log to /etc/letsencrypt/logs/letsencrypt.log dnsrobocert_1 | Requesting a certificate for sub.domain.example dnsrobocert_1 | Hook '--manual-auth-hook' for sub.domain.example reported error code 1 dnsrobocert_1 | Hook '--manual-auth-hook' for sub.domain.example ran with output: dnsrobocert_1 | Executing auth hook for domain sub.domain.example, lineage sub.domain.example. dnsrobocert_1 | Hook '--manual-auth-hook' for sub.domain.example ran with error output: dnsrobocert_1 | Error while executing the auth hook: dnsrobocert_1 | The resolution lifetime expired after 5.402 seconds: Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out. dnsrobocert_1 | Traceback (most recent call last): dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/hooks.py", line 40, in main dnsrobocert_1 | globals()[parsed_args.type](dnsrobocert_config, parsed_args.lineage) dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/hooks.py", line 61, in auth dnsrobocert_1 | txt_challenge(certificate, profile, token, domain, action="create") dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/challenge.py", line 52, in txt_challenge dnsrobocert_1 | with Client(ConfigResolver().with_dict(config_dict)) as operations: dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/lexicon/client.py", line 106, in __init__ dnsrobocert_1 | zone_name = dns.resolver.zone_for_name(domain) dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1706, in zone_for_name dnsrobocert_1 | answer = resolver.resolve( dnsrobocert_1 | ^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1321, in resolve dnsrobocert_1 | timeout = self._compute_timeout(start, lifetime, resolution.errors) dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1075, in _compute_timeout dnsrobocert_1 | raise LifetimeTimeout(timeout=duration, errors=errors) dnsrobocert_1 | dns.resolver.LifetimeTimeout: The resolution lifetime expired after 5.402 seconds: Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out. dnsrobocert_1 | dnsrobocert_1 | Certbot failed to authenticate some domains (authenticator: manual). The Certificate Authority reported these problems: dnsrobocert_1 | Domain: sub.domain.example dnsrobocert_1 | Type: dns dnsrobocert_1 | Detail: DNS problem: NXDOMAIN looking up TXT for _acme-challenge.sub.domain.example - check that a DNS record exists for this domain dnsrobocert_1 | dnsrobocert_1 | Hint: The Certificate Authority failed to verify the DNS TXT records created by the --manual-auth-hook. Ensure that this hook is functioning correctly and that it waits a sufficient duration of time for DNS propagation. Refer to "certbot --help manual" and the Certbot User Guide. dnsrobocert_1 | dnsrobocert_1 | Hook '--manual-cleanup-hook' for sub.domain.example reported error code 1 dnsrobocert_1 | Hook '--manual-cleanup-hook' for sub.domain.example ran with output: dnsrobocert_1 | Hook '--manual-cleanup-hook' for sub.domain.example ran with error output: dnsrobocert_1 | Error while executing the cleanup hook: dnsrobocert_1 | The resolution lifetime expired after 5.402 seconds: Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out. dnsrobocert_1 | Traceback (most recent call last): dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/hooks.py", line 40, in main dnsrobocert_1 | globals()[parsed_args.type](dnsrobocert_config, parsed_args.lineage) dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/hooks.py", line 123, in cleanup dnsrobocert_1 | txt_challenge(certificate, profile, token, domain, action="delete") dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dnsrobocert/core/challenge.py", line 52, in txt_challenge dnsrobocert_1 | with Client(ConfigResolver().with_dict(config_dict)) as operations: dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/lexicon/client.py", line 106, in __init__ dnsrobocert_1 | zone_name = dns.resolver.zone_for_name(domain) dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1706, in zone_for_name dnsrobocert_1 | answer = resolver.resolve( dnsrobocert_1 | ^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1321, in resolve dnsrobocert_1 | timeout = self._compute_timeout(start, lifetime, resolution.errors) dnsrobocert_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dnsrobocert_1 | File "/usr/local/lib/python3.11/site-packages/dns/resolver.py", line 1075, in _compute_timeout dnsrobocert_1 | raise LifetimeTimeout(timeout=duration, errors=errors) dnsrobocert_1 | dns.resolver.LifetimeTimeout: The resolution lifetime expired after 5.402 seconds: Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out.; Server Do53:127.0.0.11@53 answered The DNS operation timed out. dnsrobocert_1 | Executing cleanup hook for domain sub.domain.example, lineage sub.domain.example. dnsrobocert_1 | Some challenges have failed. dnsrobocert_1 | Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /etc/letsencrypt/logs/letsencrypt.log or re-run Certbot with -v for more details. dnsrobocert_1 | ---------- dnsrobocert_1 | 2023-11-29 09:24:16 a7a1181186b5 dnsrobocert.core.certbot[1] ERROR An error occurred while processing certificate config {'domains': ['sub.domain.example'], 'force_renew': True, 'profile': 'some_profile_name'}: dnsrobocert_1 | Command '['/usr/local/bin/python3', '-m', 'dnsrobocert.core.certbot', 'certonly', '-n', '--user-agent-comment', 'DNSroboCert/3.25.0', '--preferred-chain', 'ISRG Root X1', '--config-dir', '/etc/letsencrypt', '--work-dir', '/etc/letsencrypt/workdir', '--logs-dir', '/etc/letsencrypt/logs', '--manual', '--preferred-challenges=dns', '--manual-auth-hook', '/usr/local/bin/python3 -m dnsrobocert.core.hooks -t auth -c "/tmp/tmptklzdm7l/dnsrobocert-runtime.yml" -l "sub.domain.example"', '--manual-cleanup-hook', '/usr/local/bin/python3 -m dnsrobocert.core.hooks -t cleanup -c "/tmp/tmptklzdm7l/dnsrobocert-runtime.yml" -l "sub.domain.example"', '--expand', '--deploy-hook', '/usr/local/bin/python3 -m dnsrobocert.core.hooks -t deploy -c "/tmp/tmptklzdm7l/dnsrobocert-runtime.yml" -l "sub.domain.example"', '--server', 'https://acme-v02.api.letsencrypt.org/directory', '--cert-name', 'sub.domain.example', '--force-renew', '--key-type', 'rsa', '-d', 'sub.domain.example']' returned non-zero exit status 1. dnsrobocert_1 | 2023-11-29 09:24:16 a7a1181186b5 dnsrobocert.core.certbot[1] INFO Revoke and delete certificates if needed ```

Using the dig command for dns.hetzner.com (we use Hetzner DNS) and acme-v02.api.letsencrypt.org returned correct results within a few milliseconds both on the host and in the container. We restarted the docker service which also restarted the container, but the issue persisted. Since all our other hosts work perfectly with DNSroboCert 3.24.2 we downgraded to that version which fixed the issue.

The issue seems to be in the DNS zone name resolution update. I have looked a bit through the sources and I assume the addition of the "resolve_zone_name": True config option to the config_dict passed to ConfigResolver().with_dict causes the error, as it leads to the call of dns.resolver.zone_for_name later on. However, I can't find any hint why the dnspython call to dns.resolver.zone_for_name would timeout when normal DNS queries work perfectly.

adferrand commented 9 months ago

Hello @Vertganti !

Indeed I made a change in Lexicon to do a smarter resolution of the actual DNS zone name using requests to DNS servers. Your analysis is correct, there is a timeout when dnspython is doing the request. I do not have a clue either sadly about why this would create a problem here...

While investigating, I think I should put anyway a way to disable the DNS zone name resolution entirely when it is not needed and creates problem. I still think this should be the default, but this should be a possible opt-out.

I will try to work on the issue in the next days. Sorry for the inconvenience.

Vertganti commented 9 months ago

Thank you, a config option to opt-out sounds like a good solution.

DotOnedotNL commented 4 months ago

FWIW, I see the same issue with TransIP. I upgraded to 3.25 and see the same timeout. Downgrading to 3.24 resolved the issue.

jhomer-hscl commented 4 months ago

I have what I think is the same issue with OVH, I'm running 3.24.1 and working fine. If I move to anything newer it all goes wrong.

Happy to test and proposed fixes, changes, future versions.