go-acme / lego

Let's Encrypt/ACME client and library written in Go
https://go-acme.github.io/lego/
MIT License
8.06k stars 1.03k forks source link

desec: hangs forever at waiting for DNS record propagation #1406

Open artemislena opened 3 years ago

artemislena commented 3 years ago

What did you expect to see?

I expected that lego would obtain a certificate once the DNS record was updated.

What did you see instead?

lego hangs forever at repeatedly "waiting for DNS record propagation" even though dig -tTXT _acme-challenge.mumble.artemislena.eu shows me that the record has been updated successfully already.

Steps to reproduce

  1. Install NixOS
  2. git clone https://codeberg.org/FantasyCookie17/nixos-server-configs/ && cd nixos-server-configs && ./apply.sh mumble Note: The relevant config from there is this:
    security.acme = {
    acceptTerms = true;
    email = "fantasycookie17@artemislena.eu";
    certs.mumble = {
            domain = "mumble.artemislena.eu";
            dnsProvider = "desec";
            credentialsFile = "/etc/acme/desec";
            postRun = "systemctl restart murmur";
            group = "murmur";
        };
    };
  3. Change security.acme.certs.mumble.domain to what you have on deSEC
  4. Put something like this in /etc/acme/desec:
    
    DESEC_TOKEN=secret

DESEC_HTTP_TIMEOUT=60 DESEC_PROPAGATION_TIMEOUT=4200 DESEC_POLLING_INTERVAL=120

5. `systemctl restart acme-mumble.service`

### Details

<details><summary>Version of lego</summary>

```console
$ lego --version
lego version 4.3.1 linux/arm64

Logs ```console May 15 00:57:45 rock64-2g systemd[1]: Starting Renew ACME certificate for mumble... May 15 00:57:45 rock64-2g acme-mumble-start[985]: + echo ac5ca4447a4f8a2d1d8b May 15 00:57:45 rock64-2g acme-mumble-start[986]: ++ ls -1 accounts May 15 00:57:45 rock64-2g acme-mumble-start[985]: + '[' -e certificates/mumble.artemislena.eu.key -a -e certificates/mumble.artemislena.eu.crt -a -n acme-v02.api.letsencrypt.org ']' May 15 00:57:45 rock64-2g acme-mumble-start[985]: + lego --accept-tos --path . -d mumble.artemislena.eu --email fantasycookie17@artemislena.eu --key-type ec256 --dns desec run May 15 00:57:46 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:46 [INFO] [mumble.artemislena.eu] acme: Obtaining bundled SAN certificate May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/13121589978 May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: Could not find solver for: tls-alpn-01 May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: Could not find solver for: http-01 May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: use dns-01 solver May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: Preparing to solve DNS-01 May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: Trying to solve DNS-01 May 15 00:57:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:57:47 [INFO] [mumble.artemislena.eu] acme: Checking DNS record propagation using [192.168.1.1:53 [fd6c:337f:780f::1]:53] May 15 00:59:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:59:47 [INFO] Wait for propagation [timeout: 1h10m0s, interval: 2m0s] May 15 00:59:47 rock64-2g acme-mumble-start[987]: 2021/05/15 00:59:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:01:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:01:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:03:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:03:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:05:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:05:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:07:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:07:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:09:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:09:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:11:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:11:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:13:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:13:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:15:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:15:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:17:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:17:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:19:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:19:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. May 15 01:21:47 rock64-2g acme-mumble-start[987]: 2021/05/15 01:21:47 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation. ```
ldez commented 3 years ago

Hello,

I will not install NixOS just to reproduce your use case, sorry. Then I recommend using the standalone binary and check your network configuration and (if you have one) your local DNS. And maybe use the --dns.resolvers option https://go-acme.github.io/lego/usage/cli/

artemislena commented 3 years ago

I will not install NixOS just to reproduce your use case, sorry.

Sure. I don't think it's really necessary anyway.

Then I recommend using the standalone binary and check your network configuration and (if you have one) your local DNS. And maybe use the --dns.resolvers option https://go-acme.github.io/lego/usage/cli/

The DNS server used by the machine is the same as the one used by the machine I ran dig on to see whether the record propagated; I even ran that on the same machine… Anyway, I tried with --dns.resolvers ns1.desec.io:53 now. I'll see what happens.

artemislena commented 3 years ago
root@rock64-2g> lego --accept-tos --path . -d mumble.artemislena.eu --email fantasycookie17@artemislena.eu --key-type ec256 --dns.resolvers ns1.desec.io:53 --dns desec run                                 /var/lib/acme/.lego
2021/05/15 02:06:52 No key found for account fantasycookie17@artemislena.eu. Generating a P256 key.
2021/05/15 02:06:52 Saved key to accounts/acme-v02.api.letsencrypt.org/fantasycookie17@artemislena.eu/keys/fantasycookie17@artemislena.eu.key
2021/05/15 02:06:53 [INFO] acme: Registering account for fantasycookie17@artemislena.eu
!!!! HEADS UP !!!!

Your account credentials have been saved in your Let's Encrypt
configuration directory at "accounts".

You should make a secure backup of this folder now. This
configuration directory will also contain certificates and
private keys obtained from Let's Encrypt so making regular
backups of this folder is ideal.
2021/05/15 02:06:53 [INFO] [mumble.artemislena.eu] acme: Obtaining bundled SAN certificate
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/13123699550
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: Could not find solver for: tls-alpn-01
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: Could not find solver for: http-01
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: use dns-01 solver
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: Preparing to solve DNS-01
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: Trying to solve DNS-01
2021/05/15 02:06:54 [INFO] [mumble.artemislena.eu] acme: Checking DNS record propagation using [ns1.desec.io:53]
2021/05/15 02:08:54 [INFO] Wait for propagation [timeout: 1h10m0s, interval: 2m0s]
2021/05/15 02:08:54 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation.
2021/05/15 02:10:54 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation.
2021/05/15 02:12:54 [INFO] [mumble.artemislena.eu] acme: Waiting for DNS record propagation.

So, this is what I got. The records were updated and accessible via dig from the machine before the last message appeared; I checked.

artemislena commented 3 years ago

I now added --dns.disable-cp to the flags, and it seems to work just fine (Mumble stopped complaining about the validity of the cert of my server), convincing me that there is indeed some kind of bug with how record propagation is checked for.

chrisnovakovic commented 3 years ago

@ldez I've reproduced this with a domain name whose DNS is hosted by he.net, and also found that --dns.disable-cp solved the infinite wait problem.

This appears to happen whenever there are no SOA records when lookupNameservers queries for them: checkDNSPropagation returns an error (could not determine the zone: could not find the start of authority for _acme-challenge.[...].: NOERROR) that isn't printed until the call to wait.For in challenge/dns01/dns_challenge.go times out. (@FantasyCookie17 wouldn't have seen the eventual reporting of the error because in the example they gave they set the propagation timeout to 1h10m and presumably killed lego before that time had elapsed.)

In this circumstance, I don't think it makes sense to keep retrying until the timeout: if there weren't any SOA records in the first answer, there surely won't be any in the later answers. ~I also think it makes more sense to query the NS records rather than the SOA records here if the intention is to get a list of all of the nameservers that could answer with the challenge token, although I imagine it's been designed this way for a reason.~ Okay, I see what it's doing now: it's looking for an SOA record to determine where the zone starts, then querying for NS records in that zone. Maybe it could just recursively query for the NS records instead of trying to find an SOA record first?

ldez commented 3 years ago

The fact that --dns.disable-cp solves your problem is the sign that you have a network/DNS problem.

artemislena commented 3 years ago

How come lookups with other tools work just fine, then?

ldez commented 3 years ago

Because they don't check the propagation.

lego works for a lot of people, why it works just fine for them?

maccident commented 3 years ago

FWIW, I had the same issue with lego failing while checking for propagation using Cloudflare as my registrar and 1.1.1.1 as the nameservers via DNS_RESOLVERS='1.1.1.1'

I'm running it as part of the udm-le integration, and could also verify manually that the TXT records (for two hosts) had been set properly. I verified via Cloudflare's management UI, via CLI nslookup / dig from my UDMP, and from several hosts behind the UDMP.

I was able to work around the issue via --dns.disable-cp

If the issue is a DNS/network issue, I could only assume that it is an issue with Cloudflare.

aep commented 3 years ago

how is propagation checked in lego?

i've been waiting for a full day and https://www.whatsmydns.net/ tells me the record is available everywhere. however, lego still says "acme: Waiting for DNS record propagation." which is suspeciously different from "Checking DNS record propagation using [8.8.8.8:53]" so i assume these are different, and the second isnt documented.

looking at the code, it seems to just query the authoritive nameservers, which in my case is gandi, all of which have the record.

aep commented 3 years ago

in my case the issue was that lego is probing ipv6 nameservers on a machine that doesnt have an ipv6 router. dns.disable-cp worked.

shawnhank commented 3 years ago

So glad I found this issue because I was running into the same thing when trying to gin up a cert using Gandi V5.

dns.disable-cp is a lifesaver!

sepbot commented 2 years ago

Thanks @aep that's exactly what it was for me as well. I just enabled IPv6 on my ISP end and it started automagically working again.

pjones commented 2 years ago

I ran into this IPv6 issue as well. I can't figure out why, but all of hosts I have access to across multiple networks can't make DNS queries over IPv6, they always time out.

The authoritative name servers for my domain have multiple IPv4 and IPv6 addresses. It seems to me that lego doesn't try the next IP address for the authoritative name server when receiving a timeout.

I had to disable IPv6 on the host running lego in order to renew a cert with dns-01.