Closed OnFreund closed 6 years ago
Did you get it to work? The default timeout is 120 seconds for cloudflare I believe which is changeable for users of the CLI and library. I don't know if caddy allows you to do so as well. Cloudflare is known to have some hiccups in terms of DNS propagation so it could take some time.
@xenolf I couldn't get it to work. For increasing the timeout, is there a better way than rebuilding caddy from source?
As for propagation - the thing is I see the record right away with dig. I have one terminal open where I start caddy, and another where I just run dig a couple of times. It only takes a few seconds for the record to be picked up in dig, yet caddy times out.
Use the -catimeout
flag with Caddy: https://caddyserver.com/docs/cli - I'm not sure if it will help with a DNS timeout though.
@mholt. Doesn't seem to work - still times out after 2 minutes
Any other suggestions?
@xenolf @mholt really appreciate the help so far. Any other ideas? catimeout
has no effect on the DNS timeout, and running dig at the same time as the challenge, from the same system, does show the correct entry being picked up in a matter of seconds.
Hey @OnFreund. How does your DNS setup look like? Is it just a straight A record or is it something more elaborate? Also, is laura.ns.cloudflare.com
actually one of your two assigned DNS servers you got from cloudflare?
@xenolf a regular A record for a subdomain. Yes, laura.ns.cloudflare.com
is indeed one of the two assigned DNS servers.
Is cloudflare your only DNS provider or is your second level domain handled by another provider?
Cloudflare is the only DNS provider. The domain is registered with GoDaddy, and has the nameservers point to CloudFlare.
Is it possible for you to run the latest release of lego against the staging endpoint (https://acme-staging.api.letsencrypt.org/directory
) of LE to test it without caddy? It should give your more log output as well and you can try to increase the DNS timeout directly.
Sure. I'll give it a try next week and will update here.
Here's the output from using Lego directly (w/o Caddy). 10.10.1.1
is my local gateway that has DNS caching. However, as always, running dig at the same time does return the correct record.
2017/10/07 12:33:57 [INFO][myhost.mydomain.com] acme: Obtaining bundled SAN certificate
2017/10/07 12:33:58 [INFO][myhost.mydomain.com] AuthURL: https://acme-staging.api.letsencrypt.org/acme/authz/Y08-yhPTom6VNHoHjm9v5IrnpjSQQd0UEkUufIWL6aw
2017/10/07 12:33:58 [INFO][myhost.mydomain.com] acme: Could not find solver for: http-01
2017/10/07 12:33:58 [INFO][myhost.mydomain.com] acme: Trying to solve DNS-01
2017/10/07 12:34:00 [INFO][myhost.mydomain.com] Checking DNS record propagation using [10.10.1.1:53 127.0.0.53:53]
2017/10/07 12:36:02 [myhost.mydomain.com] Could not obtain certificates
Time limit exceeded. Last error: NS laura.ns.cloudflare.com. did not return the expected TXT record
@xenolf any ideas? thanks again
@OnFreund could you include in your output the lego command and options used?
./lego_linux_amd64 --dns cloudflare -d myhost.mydomain.com -m myemail@domain.com -s https://acme-staging.api.letsencrypt.org/directory run
As always myhost.mydomain.com
is replaced with the real domain, and myemail@domain.com
is replaced with my real email.
The environment contains CLOUDFLARE_EMAIL
and CLOUDFLARE_API_KEY
(and the record is indeed added to the cloudflare, it's just Lego that's not picking it up)
@OnFreund, I figured you probably missed the bit xenolf mentioned about "you can try to increase the DNS timeout directly." and was about to recommend using --dns-timeout
in your command, but the conversation in #253 indicates there is no way to override this timeout, except in the provider while a comment two months prior indicate --dns-timeout
should override the propagation timeout. I'm a bit confused now.
One comment suggests the Cloudflare propagation timeout should be 5 minutes [2]. Only thing I can think of trying at this time is clone the repo, update the timeout [3] to 5 (maybe 10) minutes, recompile, and retest. There was a related discussion in #167 that was closed after Cloudflare resolved inconsistencies on their end.
[2]: https://github.com/xenolf/lego/issues/241#issuecomment-241517204 [3]: https://github.com/xenolf/lego/blob/master/providers/dns/cloudflare/cloudflare.go#L52
@boxofrox thanks for the detailed answer. I mentioned above that increasing the timeout had no effect, and looking at the code corroborates it - it's hard coded to be two minutes.
However, the record changes propagate immediately, and dig picks them up right away, so I doubt that it has anything to do with timeouts or propagations.
What could be other causes, other than propagation?
While I'm not using Cloudflare, but rather bind9 on a local network, I'm running into a similar problem: watch -n 1 -- dig @nameserver _acme-challenge.subdomain.example.com txt
updates immediately and shows the new challenge TXT record, but caddy/lego is still waiting for DNS to "propagate".
caddy-dns_1 | 2017/11/06 20:55:26 [INFO][subdomain.example.com] acme: Obtaining bundled SAN certificate
caddy-dns_1 | 2017/11/06 20:55:26 [INFO][subdomain.example.com] AuthURL: https://acme-staging.api.letsencrypt.org/acme/authz/ITJQFwkr0oCKqhDf2b87U_INO2-cvjNlP4pdoge-vpU
caddy-dns_1 | 2017/11/06 20:55:26 [INFO][subdomain.example.com] acme: Trying to solve DNS-01
caddy-dns_1 | 2017/11/06 20:55:27 [INFO][subdomain.example.com] Checking DNS record propagation using [127.0.0.11:53]
caddy-dns_1 | 2017/11/06 21:05:27 [subdomain.example.com] failed to get certificate: Time limit exceeded. Last error: dns: failed to unpack truncated message
Currently, I suspect my problem has something to do with lego checking record propagation using127.0.0.11:53
. The logs don't give me much to work with. I don't know where it's pulling that address from, but it's not in my hosts file, and if it's a bogus endpoint, then lego is talking to a resolver that isn't there.
Given you have two IP's in your logs for the record propagation check, I'd expect 10.10.1.1 to work, but maybe lego requires both IP's to resolve the challenge.
@boxofrox are you using docker by any chance?
@xenolf I am using docker.
127.0.0.11:53
is the docker DNS server.
@boxofrox You seem to have quite a different problem. While both are timeout errors, the one @OnFreund has is an actual propagation timeout while yours seems to stem from an invalid response given by the docker DNS resolver as indicated by Last error: dns: failed to unpack truncated message
.
Ah, makes sense. I'll open my own issue to discuss further then. Thanks for clarifying.
@xenolf, OnFreund's logs shows [10.10.1.1:53, 127.0.0.53:53] for the propagation check. Do you know whether one or both must succeed to pass the propagation check? If I'm reading the code [1] right, all name-servers in the list are checked until one succeeds.
[1]: https://github.com/xenolf/lego/blob/master/acme/dns_challenge.go#L200
@OnFreund, the last thing I can think of to try would be: on the same machine (myhost.mydomain.com
) you run ./lego_linux_amd64
, also run:
> MYHOST=myhost.mydomain.com watch -n1 "dig @10.10.1.1 _acme-challenge.$MYHOST txt; echo '------'; dig @127.0.0.53 _acme-challenge.$MYHOST txt"
...and confirm whether the TXT challenge records appear.
I figure, since lego is resolving with those IPs, let's limit dig to doing the same thing. If you already did this, then disregard; it wasn't mentioned specifically before.
@xenolf, OnFreund's logs shows [10.10.1.1:53, 127.0.0.53:53] for the propagation check. Do you know whether one or both must succeed to pass the propagation check?
@boxofrox one of them has to return the right record. It will try them in order as can be seen here.
@boxofrox thanks, that actually gave me a lead. I didn't really understand where 127.0.0.53
came from, but removing it from resolv.conf
fixes the problem!
However, that file seems to be dynamically generated on my system, and has this comment:
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
Google searches show lots of complaints about this mechanism, but I'm not sure what it is, what it does, and how to permanently disable it.
Sweet! Glad to see progress.
Google searches show lots of complaints about this mechanism, but I'm not sure what it is, what it does, and how to permanently disable it.
This systemd issue [1] explains why the 127.0.0.53 is used. The resolvd.conf
man page [2] might shed a bit more light also.
It's my understanding that systemd tries to be the guardian of /etc/resolve.conf
so that multiple tools can update the system's DNS nameserver list without clobbering each other.
I think the intent is for systemd to present 127.0.0.53 as the only nameserver to applications, and the systemd-resolved proxies requests to 127.0.0.53 through the actual list of nameservers.
The issue indicates that if the stub proxy and list of actual nameservers are both exposed to applications, problems like you're having ensue. The fact lego is testing propagation against both 10.10.1.1 and 127.0.0.53 indicates this might be the case, though I'd also expect to see 10.10.1.1 in your resolve.conf
, so I'm not sure.
@poettering recommends linking /etc/resolve.conf
to /usr/lib/systemd/resolv.conf
. If this works, I think this would be the easiest fix and least likely to adversely affect DNS resolution for other applications. Update: this fix was merged into systemd on Oct 5, so your server may not have a /usr/lib/systemd/resolv.conf
Before implementing that link, is your resolve.conf
a symbolic link, and if so, to which file?
Is your post of /etc/resolv.conf
the entire content? If not, can you confirm whether 127.0.0.53 is the only entry?
[1]: https://github.com/systemd/systemd/issues/7009 [2]: https://www.freedesktop.org/software/systemd/man/resolved.conf.html
The situation before my fiddling was:
/etc/resolv.conf
is symlinked to /run/resolvconf/resolv.conf
(that seems to be the correct way)10.10.1.1
and 127.0.0.53
. While the former is working, the latter doesn't even seem to be running (hence why I don't think symlinking to /usr/lib/systemd/resolv.conf
would work here - that will leave 127.0.0.53
as the only DNS server)The only thing I did was to comment out 127.0.0.53
, and it worked. However, since that file is auto-generated, I'm pretty sure it'll be overwritten at some point. After some digging, I tried adding DNSStubListener=no
to /etc/systemd/resolved.conf
.
I believe that should do the trick, but I can't, for the life of me, understand how to regenerate resolv.conf to test it.
Short of rebooting the system, I'm not sure how to regenerate the file. Which distro and release are you on?
Ubuntu 17.04
Restarting did regenerate the file, but that pesky 127.0.0.53
is back. Arrrggghhh!!!
In the mean time I'm trying a dist upgrade.
Yeah, the file was already regenerated, but still with 127.0.0.53
Hmmm... I was going to recommend the DNSStubListener
setting in this fashion.
[Resolve]
DNSStubListener="no"
I don't know if that's the correct syntax, but if it is, might be something there as to why 127.0.0.53
is back.
Also, I found a ubuntu bug report [1] you might look at with this fix if you're interested.
# ls -la /etc/resolv.conf
lrwxrwxrwx 1 root root 29 mar 7 20:20 /etc/resolv.conf -> ../run/resolvconf/resolv.conf
# rm -f /etc/resolv.conf
# ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
# ls -la /etc/resolv.conf
lrwxrwxrwx 1 root root 32 mar 8 07:30 /etc/resolv.conf -> /run/systemd/resolve/resolv.conf
[1]: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320/comments/8
The syntax without the quotes is the correct one, according to the default content of the file.
As for the bug report, my system is already in this state (/etc/resolv.conf
is a symlink to /run/systemd/resolv.conf
).
I started a dist upgrade to 17.10, and there seem to be some changes in those areas. I'll report post upgrade.
Ok, here's what I did:
resolvconf
package/run/systemd/resolve/stub-resolv.conf
(which contains only the evil 127.0.0.53
) to /run/systemd/resolve/resolv.conf
(which only contains the correct DNS server, without the evil local one).Great success!!!
I'm having the same problem with Cloudflare requests timing out after 2 minutes on Caddy 0.11.1.
Not sure if that's a Caddy or Lego problem though. I seem to recall there being an option for Lego, or maybe it was another client, that allowed you to change the timeout time, but unfortunately I can't find that now.
I don't known how to manage that in Caddy but in Lego you can define some env vars:
CLOUDFLARE_POLLING_INTERVAL
CLOUDFLARE_PROPAGATION_TIMEOUT
lego dnshelp
Thanks, that's the list I was looking for!
I'll make a Caddy issue to see if it can use those variables.
Hi,
I'm trying to use a DNS challenge with CloudFlare, but am getting:
Time limit exceeded. Last error: NS laura.ns.cloudflare.com. did not return the expected TXT record
However, if I use dig to get the relevant TXT entry, it works (in real life I'm using the correct domain, not
myhost.mydomain.com
):My Caddy version:
What am I missing? Thanks!