debops / ansible-nginx

Install and manage nginx webserver
GNU General Public License v3.0
50 stars 42 forks source link

Http and Https default site detection #71

Open patrickheeney opened 9 years ago

patrickheeney commented 9 years ago

We started talking about this on IRC. The problem seems to stem from the fact that debops picks an http or https site and stores it in facts. In subsequent runs, using tags, or using roles with debops.nginx as a dependency has issues because the fact is no longer accurate. Nginx can only set ipv6only=off once per listen, so debops has attempted to use the fact to detect the default site for http and https and set this value. I am creating this issue as a placeholder to discuss further.

I created a test environment https://github.com/patrickheeney/ansible-nginx-test which uses my bug fix in #70 because debops.nginx currently does not detect the right site without it. In this version you can run different tags and see what debops.nginx picks as the default, as well as what is stored in the facts. The facts are also cleared on each run. (You can uncomment debops.nginx in the requirements.yml file and comment out my version to test with stock debops and the bug in #70).

Some workarounds:

  1. Don't use debops.nginx in the main playbook and include all of your sites in a role with debops.nginx as a dependency. This way debops.nginx only executes once and with the fix in #70 will determine the right defaults. The facts will need to be cleared if any sites change.
  2. Include a default site. The default site should support http and https and act as the conf that sets up nginx. I set it up to return 404, but I'd like to improve this by denying the connection.
# group_vars/all

nginx_servers:
  - enabled: True
    name: ['default']
    default_index: ''
    location:
      '/': 'return 404;'
  - enabled: True
    name: ['default_ssl']
    default_index: ''
    listen: False
    ssl: True
    location:
      '/': 'return 404;'

Ideas:

This seems like a difficult issue to solve. The role is not aware of the global state and has no idea the master list of all sites. We can't assume the server will have a default site, any https sites, or any sites in general.

One idea is to set the facts on first run like normal, but then validate those facts on subsequent runs. For example, if a second run of debops.nginx adds an https site, it will need to save that fact. If a third run adds another https site, the fact is no longer default so it will not be set. However, what happens when sites are removed or de-activated in all of these scenarios, how will it know which to pick next and re-configure. So it seems like this will not work with all scenarios.

A second idea is to run a shell or python script at the end of debops.nginx that is responsible for setting the default sites and saving the facts. The script would essentially check if there is an http site, set the first one it finds as default. It would do the same for https perhaps using find and a regex. It would then have to do a shell equivalent of lineinfile to add ipv6only=off which is far from ideal. It would then save the sites in the facts, on subsequent runs if the sites still exists, it would exit immediately.

A third idea is to just be explicit. Add some documentation that explains that the default ssl site has to be specified in the config default_site_ssl: 'test.com'. Perhaps even on the nginx configuration like item.default_https: True. It would be up to the user to specify this only once, or maybe the first one it comes across gets saved as a fact. This is almost how it works now, but perhaps we can write some troubleshooting information for detecting the issue that comes up when this is not set (connection refused).

drybjed commented 9 years ago

I'm afraid the above solutions aren't the correct ones in this problem, because problem is completely elsewhere. But if I tell you that it's just lack of a default certificates that debops.nginx can use due to the fact that debops.pki wasn't configured because you didn't set a domain for that host, we will still be arguing this a week later.

So instead let me tell you how these different pieces fit together. It will be a long read (got the tl;dr above), but I hope it will clear up some things for you and others about how I imagine and design stuff in DebOps.

Let's start at the top, in this case, nginx. The debops.nginx role was designed with some ideas in mind, one such idea being that we use HTTPS by default. HTTP should be an exception if necessary for when client host is not presumed to have working encryption capability (lack of OpenSSL during installation, for example). Otherwise, HTTP is there to redirect the client to HTTPS, period.

But why? It's simple - we are living in 2015, not 1990. Everybody is doing HTTPS these days, there's huge movement all over the Internet to switch everything to HTTPS. Let's Encrypt, HTTPS Everywhere, Mozilla wants to deprecate HTTP entirely, others do similar things. So it's basically not a choice if a site should or shouldn't run HTTPS. You do it, end of story.

But there's a problem - if you want to offer HTTPS, you need to have SSL certificates. Unfortunately, webservers like nginx don't come bundled with a set of certificates and keys by default, so we need to go elsewhere for them. In fact, the operating system we use should give us a set of certificates to use, because why not? It's just a couple of openssl commands that can be run from a script. OpenSSH does this very thing, right?

In fact, Debian and Ubuntu provide ssl-cert package which handles this - it creates a set of "SnakeOil" certificates and keys for other applications to use. Unfortunately, nobody trusts them, because they are self-signed, so nobody trusts them. In other words, they are useless for production purposes.

OK, so let's create our own pair of self-seigned certificatest instead, exactly what you just did in your own playbook. Actually, let's go back even further - in fact, first public commit of "ansible-aiua" project, with over time became DebOps, used the very same concept of generating self-signed certificates on each host so that they could be used by other services. Going further, lots of places over the Web with examples how to setup a HTTPS site describe in detail how to setup your own pair of private key and self-signed certificate. Seems that this is the go-to way of providing HTTPS encryption on the Web these days.

Unfortunately, our own self-signed certificate still is not trusted by anyone or anything else. This isn't an issue we can easily fix, this is mathematics. The way more smarter people than me found a workaround for this problem long time ago - create a third party that is trusted both by a client and a server - if the certificate a client receives from a server is signed by a third party (a Certificate Authority) that the client trusts, the connection is accepted without issues. Otherwise - the resolution is dependent on the application. Web browsers display a warning and let you skip it, which you can decide yourself as a human. But other clients like various services don't have that option. They either drop the connection entirely, or can be instructed using an option to implicitly trust untrusted certificates, which is unacceptable in a production environment.

But why should we care? Because DebOps is meant to be used in a distributed environment, on multiple hosts at once. Sure, different services could be used over plaintext communication channel, but again - it's 2015, not 1990. We want everything that is coming over the wire to be encrypted, thus providing secure channel for the server and client to exchange data. Without encryption, you risk getting hacked when you try to set up LDAP for all of your servers, especially in the "cloud" environment.

But again, all of this requires certificates and we are back to square one. Self-signed certificates won't work, because they would need to be distributed p2p style to all of the concerned hosts. The solution to this problem is to go to a Certificate Authority and get yourself a certificate that is trusted by all involved parties. But alas, that is both expensive and not automated - and remember, we want nginx to offer HTTPS by default, not after some time when you get the certificates and provide them to the server.

So, what to do? Here's an idea - create our own Certificate Authority which all hosts trust implicitly (for the moment forget about external clients, we will come back to them later). We could get a Certificate Signing Request from each of the hosts managed by us, sign them, and send that certificate back. And after installing our own Root CA certificate on all of the hosts, they trust us as the third party and communication between various services can be done without issues, and over a secure channel.

I've looked at various solutions to this available at the time, but none were fit for the function I wanted them to perform. Most of the available solutions used either a graphical interface, or were not automated enough and required human interaction which wasn't satisfactory. More automated solutions like FreeIPA / DogTag weren't available in the Debian repositories at the time.

So I've decided to create my own PKI. It would be automated (after all we trust ourselves implictly, and hopefully Ansible Controller that the CA runs on is trusted as well), would allow for management of both internal, free and strong certificates, as well as custom certificates signed somewhere else and provided by the user after a time. It would have to be centralized, both in the sense that it should run on Ansible Controller, and other roles and applications would be designed to use certificates managed by that PKI in one central place in the filesystem, instead of custom solution for each application. A set of defaults would be provided so that users could setup their own PKI with minimal changes in the role itself.

But I guess, I failed by choosing to base the whole PKI concept around a DNS domain which would be used by all of the involved hosts. The concept of a domain which binds all of the hosts in a shared DNS space is useful - you can put a host in a domain and it can use services easily via subdomains - for example LDAP queries can be automatically directed to ldap.<your-domain-com> and all the hosts that are set up correctly will know where that host lives by asking DNS. Using a domain as a kind of bind for all of our certificates signed by the custom CA was a simple decision to make.

But it seems that this concept fails miserably when confronted with reality. Because of that, the PKI used by DebOps would have to be redesigned a third time. But it's fine, I planned for this from the beginning. When I wrote this iteration of debops.pki role I wasn't entirely sure some of the things it does were correct, and affer some time I know what's bad and what needs to be redesigned. So it will be rewritten, when I get some time to do this.

But let's go back to the topic of our discussion. When the debops.pki role works (with a configured domain), it creates a set of private keys and certificates signed by a centralized CA which all of the hosts trust. These certificates are stored in a known location and all other roles and services included in DebOps use them via a set of variables. That way, certificates can be created once and reused automatically as necessary.

Because of that, debops.nginx role has a set of valid SSL certificates available from the start, and HTTPS can be configured as default. In fact, to make things more configurable, it tracks the default server separately for HTTP and HTTPS, so that it can use different default nginx servers if necessary. If the nginx server is configured in normal fashion from the default DebOps playbook, all of that works. Even when user configures its own set of nginx servers using his or her own playbook, as long as none of the configuration already present in /etc/nginx/sites-enabled/ is overwritten, it all works. nginx is satisfied that default_server ipv6only=off is present on some site and enables both IPv4 and IPv6 listening on ports 80 and 443 and everything is good in the world. That's what debops.nginx role is designed to do - provide HTTPS by default.

But then, the debops.pki stuff is optional since people don't necessarily have to use default DebOps playbook to setup all of the needed things for them. Because of that, debops.nginx checks if the default PKI has been configured and if not, gracefully degrades to HTTP-only mode of operation. Unfortunately the above code that selects default server for HTTP and HTTPS separately isn't designed for HTTP-only operation. Default HTTP server is selected automatically and correctly, but HTTPS, if no valid HTTPS site is present at the first configuration time, falls back to the default, which, due to the fact PKI was not configured, is in this case incorrect.

This is exactly what happens in the example playbook you provided above. First debops.nginx configuration has only a HTTP site, but no HTTPS, and because of that debops.nginx falls back on a default configuration. And then you want to configure a HTTPS site, which is not the "designated default_server", and thus none of the configured servers that listen on 443 port have ipv6only=off option enabled. nginx listens only on IPv6 on the 443 port and any connections over IPv4 to that port get "connection refused" error.

Now, the fix you are proposing is to check essentially "out of band" to see if any of the nginx configuration files in /etc/nginx/sites-enabled/ have HTTPS enabled and if not, pick one of them at random (or point the role manually at it) and modify that configuration to add ipv6only=off to it. But the problem is, as you correctly noticed, that the debops.nginx role is stateless, which means that at any point debops.nginx is used to configure nginx server, previous configuration could be already present. And that previous configuration could in theory already enable ipv6only=off option. So then what, remove that option from the "old" configuration and move it to the "new" one? What should then happen when debops.nginx is run again with the old configuration, which is obviously modified by some external program? Recheck everything again, find the offending ipv6only=off option in the "new-old" configuration, remove it, put it in "old-new" configuration. And then do the same thing again when your playbook is re-run. And again, and again, and again. We get a loop, and we lose idempotency.

But hold on - debps.nginx was designed specifically to solve that issue! So what happened? The problem isn't with the default server mechanism implemented, but with lack of certificates the role could use by default to configure a HTTP-only site as both HTTP and HTTPS site. After all, HTTPS is the default, not HTTP.

Could this be solved without using the debops.pki role that requires a domain? Of course! debops.nginx uses a set of default variables that point to the global certificate the role can use to set up a default site. If you just pointed the role to the existing certificates using these variables, everything would work out fine. Obviously they need to be generated or copied from somewhere else, but that can be handled by the user, or, in worst case scenario, by debops.pki.

patrickheeney commented 9 years ago

Thanks for the background and explanation. I just want to clear a few things from my previous post. I was not proposing any sort of solutions, but rather discussion the issue as I understood it, in hopes to come to a resolution.

I understand debops use case and your reasoning to create a central authority. I guess that puts me in a bind, because I don't share the same use case as you, or need the overhead of a central authority right now. I am currently using debops in a few different projects, and this particular one needs an https site on ubuntu. I don't want a distributed environment, lots of dependencies or overhead, etc. I just want to generate a CSR, purchase a cert, install the cert on nginx, and keep it automated as possible. I came to realize that debops.pki was not for me, which is ok, because my needs are currently much more simplistic then what it solves. This eventually lead me to this situation where just using debops with a domain, pki, with default nginx config, was not the solution.

I actually did use debops.pki exactly as you had suggested. I have been at this for better part of a week, so I don't recall the order things happened in. There was always problems getting this to work, probably a combination of invalid config, blank ansible_domain which I submitted debops/ansible-pki#25 so I could supply manually. This then lead me to #70 as my sites are declared in a role on second run of nginx and it was not detecting it. After attempting to fix that, I still encountered problems.

I completely believe what you are saying and have no intention to argue a week later. I was more interested in discussing the problem to better understand it, to ultimately come to a resolution. So we have outlined the problem as the lack of global certificates and debops.nginx not knowing the location of it. So I am back to the original problem above. I need the CSR to obtain and then copy the real certificate, but debops needs a global certificate first? So I need to self-sign a global certificate on first run, but it would not be trusted as you pointed out, which means I need to navigate and involve debops.pki, which does not work for me?

drybjed commented 9 years ago

I assume that you have each environment in a separate DebOps project directory. Ansible Controller is your laptop, and a webserver is somewhere in the cloud. Here's how you could handle the case of example.com website with current DebOps roles:

Create new DebOps project directory, cd into it. In ansible/inventory/group_vars/all/bootstrap.yml set:

---
bootstrap_domain: 'example.com'

This tells the debops.boostrap role what domain any host managed by DebOps in this project should be in.

Create new webserver host on your hosting provider, let's say it's clean Debian netinstall with only ssh configured. You have access to the host through root account with a password. Let's call this host marble, according to good naming scheme practices. When you add that to your domain, you get marble.example.com as your webserver. Put the A record for marble in the example.com domain zone, in your DNS. If your client controls the DNS, tell him to put that record and point it to your webserver.

Add marble.example.com entry to Ansible inventory. For example, in ansible/inventory/hosts put:

marble ansible_ssh_host=marble.example.com

When you make sure that you can connect to root@marble using a password, run:

debops bootstrap -l marble -k -u root

This will generate ansible.cfg with configuration used by Ansible, login to marble, setup an user account for your with sudo access and your SSH keys, and configure the example.com domain on marble. To verify this, login to your new user account on marble and run:

hostname --fqdn

When you get marble.example.com, you're good.

Run command:

debops -l marble

This will configure the host to make it suitable for DebOps roles. Among other things, PKI will be created in ansible/secret/pki/ directory on Ansible Controller, and will be set up to manage example.com domain. After everything is configured, you can login on the marble server and check if /etc/pki/ directory has been set up with default certificates and keys.

Now you can configure the host to display your webpage, and nginx should configure HTTPS correctly. If you want to avoid the SSL authentication error, you can import the Root CA certificate of the example.com domain to your web browser.

When you have bought the "signed" certificate and you want to start using them instead of the ones provided by DebOps, go to the ansible/secret/pki/ directory and put them in the correct realm, usually that will be domain if you have a wildcard. You can read some more information about how PKI is designed here.

When you switch the default certificates from the internal ones to provided by you, debops.pki will update the symlinks. After that you will need to restart the nginx daemon and your new certificates should show up correctly.