debops / ansible-pki

Bootstrap and manage internal PKI, Certificate Authorities and OpenSSL/GnuTLS certificates
GNU General Public License v3.0
65 stars 29 forks source link

tiny-acme suddenly causes errors #107

Closed lchski closed 7 years ago

lchski commented 7 years ago

With no configuration changes, tiny-acme now fails to request the certificate. I’ve run debops --tags role::nginx:servers and debops --tags role::pki to make sure my configurations were okay, they run with no errors. I’ve also deleted the error.log files and rerun the commands, but no luck.

Here’s a copy of one of the error.log files:

Parsing account key...
Parsing CSR...
Registering account...
Already registered!
Verifying lucascherkewski.com...
lucascherkewski.com verified!
Verifying www.lucascherkewski.com...
Traceback (most recent call last):
  File "/usr/local/lib/pki/acme-tiny", line 198, in <module>
    main(sys.argv[1:])
  File "/usr/local/lib/pki/acme-tiny", line 194, in main
    signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca)
  File "/usr/local/lib/pki/acme-tiny", line 123, in get_crt
    wellknown_path, wellknown_url))
ValueError: Wrote file to /srv/www/sites/acme/public/.well-known/acme-challenge/Pf-KGKW79jBEdqdUS3UC15JwoL9qGRX9fIb1jFRL6no, but couldn't download http://www.lucascherkewski.com/.well-known/acme-challenge/Pf-KGKW79jBEdqdUS3UC15JwoL9qGRX9fIb1jFRL6no

Happy to share any details that would help.

drybjed commented 7 years ago

The acme-tiny project should be fine, I don't think that anything changed in the script. Still, this code path isn't really tested that often...

You could try creating a file in the /srv/www/acme/public/.well-known/acme-challenge/ directory and checking if you can access it over HTTP on www.lucascherkewski.com, perhaps there is/was some DNS issue or nginx configuration issue? Anyhow, that's the first thing that comes to mind right now. I think that script uses the DNS nameservers from /etc/resolv.conf to get the address, check if you can access the file from the host itself, for example via curl.

lchski commented 7 years ago

curl -I http://www.lucascherkewski.com returns nothing from the host (curl hangs). On my computer, the query returns a 307 Redirect to the non-www version of the URL. Querying a file in the acme-challenge/ directory hangs on the host, but returns 200 OK on my computer. (The host can query other sites—like http://google.ca —no problem.) Looks like we may be closer to identifying the issue.

lchski commented 7 years ago

Running curl with -vvv returns the following:

* Hostname was NOT found in DNS cache
*   Trying 159.203.50.102...
* connect to 159.203.50.102 port 80 failed: Connection timed out
* Failed to connect to www.lucascherkewski.com port 80: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to www.lucascherkewski.com port 80: Connection timed out
drybjed commented 7 years ago

Hmmm, perhaps nginx doesn't listen over IPv4 for some reason? Check if you have default_server ipv6only=off somewhere in one of the configuration files in /etc/nginx/sites-enabled/. You could remove the /etc/ansible/facts.d/nginx.fact file which should trigger regeneration of the nginx configuration files and selection of a new default host.

Anything in nginx logs?

lchski commented 7 years ago

welcome.conf has it (let me know if you need entire config):

server {
...
    listen [::]:80 default_server ipv6only=off;
...
}

server {
...
    listen [::]:443 ssl spdy default_server ipv6only=off;
...
}

None of the other conf files (including lucascherkewski.com.conf) have it.

Checking the logs now.

drybjed commented 7 years ago

Having the default_server line only once per port should be fine. I'm using IPv4 and I can look at the site, so that definitely works.

Trying this on my host works:

% curl -vvv http://www.lucascherkewski.com/
* Hostname was NOT found in DNS cache
*   Trying 159.203.50.102...
* Connected to www.lucascherkewski.com (159.203.50.102) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: www.lucascherkewski.com
> Accept: */*
> 
< HTTP/1.1 307 Temporary Redirect

No idea why it works remotely but not locally. Can you try curl http://localhost/ and see if that works?

lchski commented 7 years ago

Looks like that works fine:

root@lucascherkewski:/var/log/nginx# curl -vvv http://localhost/
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
* Server nginx is not blacklisted
< Server: nginx
< Date: Mon, 08 May 2017 20:05:13 GMT
< Content-Type: text/html
< Content-Length: 676
< Last-Modified: Mon, 08 Aug 2016 18:01:36 GMT
< Connection: keep-alive
< Vary: Accept-Encoding
< ETag: "57a8c900-2a4"
< X-Clacks-Overhead: GNU Terry Pratchett
< Accept-Ranges: bytes
<
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta http-equiv="Content-Language" content="en">
    <meta http-equiv="Content-Style-Type" content="text/css">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>localhost</title>
  </head>

  <body>
    <div id="content">

      <h2><a href="http://localhost/">localhost</a></h2>

      <p id="http-status"><strong>418 I'm a teapot</strong></p>

    </div>
  </body>
</html>
* Connection #0 to host localhost left intact
lchski commented 7 years ago

Deleted nginx.fact and am re-rerunning debops.

lchski commented 7 years ago

After that, same results. The host can’t query itself. Interestingly, the nginx configs have shifted a bit.

The main server block (after redirects) for lucascherkewski.com.conf:

...
server {
...
    listen [::]:443 ssl spdy default_server ipv6only=off;
...
}

There’s nothing disabling ipv6only for :80, nor a default_server for :80 in that file; instead, it’s in the config file for a different host, one without SSL. welcome.conf no longer has either default_server.

drybjed commented 7 years ago

That's fine, default_server for different ports can be present in different configuration files.

What's the host' DNS nameservers? Check /etc/resolv.conf, try dig www.lucascherkewski.com and see if you get the correct address.

lchski commented 7 years ago

Looks like the Google nameservers. /etc/resolv.conf:

nameserver 8.8.8.8
nameserver 8.8.4.4

dig results:

root@lucascherkewski:/etc/nginx/sites-enabled# dig www.lucascherkewski.com

; <<>> DiG 9.9.5-9+deb8u10-Debian <<>> www.lucascherkewski.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35132
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.lucascherkewski.com.   IN  A

;; ANSWER SECTION:
www.lucascherkewski.com. 1799   IN  A   159.203.50.102

;; Query time: 65 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon May 08 16:37:49 EDT 2017
;; MSG SIZE  rcvd: 68
drybjed commented 7 years ago

Well, then I don't have any more ideas at the moment.

The issue basically looks like acme-tiny script cannot access your website from the same host - it does that to confirm that the generated file is accessible before making a request to the Let's Encrypt CA. And this only occurs on the www. subdomain, not the main domain. You could try enabling request debugging in nginx, maybe that could help.

Alternatively, do you have any other hosts with Let's Encrypt certificates to confirm that the issue is not repeatable elsewhere?

lchski commented 7 years ago

No problem, I appreciate you giving your time to help.

I don’t have any other hosts running Let’s Encrypt (I just run small static sites, they all use this one host), but the two sites using Let’s Encrypt (lucascherkewski.com and ecustom.ca) on this host both fail.

None of the sites on this host respond to a curl request run from the host, so it must be something about the host’s ability to access itself. I’ll look into that.

lchski commented 7 years ago

Working with @carlalexander, we figured this out tonight. Only lucascherkewski.com was in the /etc/hosts file under the local IP entry. We had to add www.lucascherkewski.com there for the requests to resolve properly. Not 100% sure why this fell apart all of a sudden (it wasn’t an issue for months), but it’s working now.

Thank you @drybjed for your help diagnosing this issue!

ypid commented 7 years ago

Just for reference, there are sometimes valid reasons why the host which runs acme-tiny is not able to connect to it’s own vhost like in my case with haproxy and the host has no direct Internet access. Ref: https://github.com/diafygi/acme-tiny/pull/116

Validation could be disabled by this inventory configuration:

pki_acme_tiny_repo: 'https://github.com/ypid/acme-tiny'
# Unconditional --no-verify
pki_acme_tiny_version: 'eb76e696fdd2800035e99b9485514e066e562ebe'

But please only use it if you know what you are doing. In your case, proper DNS sounds like the better fix.

ypid commented 7 years ago

Not 100% sure why this fell apart all of a sudden (it wasn’t an issue for months), but it’s working now.

debops.bootstrap updates your /etc/hosts. Might that be the cause?

lchski commented 7 years ago

Thanks @ypid. My situation doesn’t mirror yours (host has direct internet access), but it’s a good tip. I don’t plan to disable validation, I’ve got it working with the updated hosts file. It might’ve been debops.bootstrap, but I haven’t run it since I created the machine (unless it runs on every call, I can’t remember).

ypid commented 7 years ago

You are welcome.

unless it runs on every call, I can’t remember

No, it does not.

thiras commented 6 years ago

Adding IP manually to hosts doesn't solve my problem. Also there is no

pki_acme_tiny_version: 'eb76e696fdd2800035e99b9485514e066e562ebe'

version in the repo right now.

My machine behind NAT so cannot find it's WAN IP. There should be another mechanism to provide WAN IP.

ypid commented 6 years ago

Proper Firewalls support NAT reflection. (And real networks don’t use NAT (→ IPv6)).

eb76e696fdd2800035e99b9485514e066e562ebe should be present. https://github.com/ypid/acme-tiny/commit/eb76e696fdd2800035e99b9485514e066e562ebe What issue do you get from Ansible?

Does this work for you?

git clone https://github.com/ypid/acme-tiny.git
git checkout eb76e696fdd2800035e99b9485514e066e562ebe

There should be another mechanism to provide WAN IP.

There is, it is called name resolution :)