coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.43k stars 302 forks source link

Implement support for self-signed CA / Certificates #837

Open noahlh opened 10 years ago

noahlh commented 10 years ago

Took me a bit to track this down, but here's what I've discovered.

Goal: Have etcd running securely using TLS certificates for transport & authentication, and allow fleet & fleetctl to connect accordingly.

Problem: While fleet supports the definition of etcd_cafile, etcd_keyfile, and etcd_certfile, when using a self-signed CA file (or otherwise non-authoritative CA, i.e. CAcert), fleet fails to connect and throws a x509: certificate signed by unknown authority error.

Cause: According to this:

http://golang.org/src/pkg/crypto/x509/root_unix.go

It looks like Go is using /etc/ssl/certs/ca-certificates.crt as the authoritative source of CA files. So even if you add your CA file hash symlink to /etc/ssl/certs (see @brianredbeard 's post here: https://groups.google.com/d/msg/coreos-user/5LlB9d_cFLA/ANtQtOXkdlwJ ) , which satisfies openssl (and curl), fleet still complains.

Temporary solution: My extremely hackey temporary solution for securing etcd & fleet --

  1. Generate certificates using etcd-ca.
  2. Install certificates as per Customizing the etcd Unit
  3. Add the fleet environment variables to cloud-config:
coreos:
  fleet:
      ...snip...
       etcd_ca_file: "/path/to/ca.crt"
       etcd_certfile: "/path/to/server.crt"
       etcd_keyfile: "/path/to/server.key"
       etcd_servers: "https://127.0.0.1:4001"
  1. Somehow get your self-signed ca.crt into /etc/ssl/certs/ca-certificates.crt. Not sure what the best way to do this, since it's just a symlink to /usr/share, which is read-only. For now, I'm just clobbering the file with my own version with my CA appended.
  2. Fleet loads up nicely. Then, set the environment variables to point to the proper endpoint (https) and the same ca, cert and key for fleetctl and voila, secure everything.
jonboulle commented 10 years ago
  1. Somehow get your self-signed ca.crt into /etc/ssl/certs/ca-certificates.crt. Not sure what the best way to do this, since it's just a symlink to /usr/share, which is read-only. For now, I'm just clobbering the file with my own version with my CA appended.

@noahlh if you check out the latest post on that email thread [1], Mike provided some more detailed instructions for how to update the CA globally - does that work for you? [1] https://groups.google.com/d/msg/coreos-user/5LlB9d_cFLA/eyyRlQ6PZ9AJ

noahlh commented 10 years ago

Ahhh what perfect timing. Yes, Mike's solution of using update-ca-certificates is great for that last bit. So I cleaned up the process slightly and now am using cloud-config to define the CA file directly in /etc/ssl/certs/my_ca.pem and then calling update-ca-certificates.

noahlh commented 10 years ago

This isn't directly related to fleet (but related to running a cluster of machines to be controlled by fleet) -- see above reference for another issue I've run into re: getting a secure etcd/fleet cluster up and running.

jonboulle commented 10 years ago

@noahlh is there anything else we need to look at on the fleet side, or can we close this ticket for now?

noahlh commented 10 years ago

All good here for now (though I don't have the cluster running yet due to the referenced issue, but that's with etcd not fleet).

My only other suggestion on this point -- perhaps it makes sense at some point to take a more "security by default" approach with fleet & etcd -- it's still a bit of a process to figure everything out, get all environment variables set properly in each system (etcd, fleet and fleetctl) and work through the details. That's bigger than this issue though and not up to me :)

jonboulle commented 10 years ago

Cause: According to this:

http://golang.org/src/pkg/crypto/x509/root_unix.go

It looks like Go is using /etc/ssl/certs/ca-certificates.crt as the authoritative source of CA files

@noahlh actually, on looking at it a bit more closely - this should never get hit when specifying etcd_ca_file, because we specify the RootCAs in the TLSClientConfig here. So provided etcd_ca_file is pointing to the appropriate CA file, you should not actually need to run update-ca-certificates for fleet specifically to work.

noahlh commented 10 years ago

I'll test my setup again and report back. Fairly certain that etcd_ca_file was specified properly and that I was still getting a x509: certificate signed by unknown authority error until I ran update-ca-certificates. Update to come...

jonboulle commented 10 years ago

@noahlh thanks for the feedback, maybe we should put together a higher level document outlining how to secure all the components with TLS.

Please let us know what you find with etcd_ca_file!

steveej commented 10 years ago

A minimal working example of generating certificates for a fresh cluster with two hosts would be perfect. I'm currently struggling to get this working. Here's an outline of my attempt:

export ETCD_DEPOT_PATH=/etc/etcd-ca/
mkdir ETCD_DEPOT_PATH
etcd-ca --depot-path=$ETCD_DEPOT_PATH init

etcd-ca --depot-path=$ETCD_DEPOT_PATH new-cert --ip 192.168.0.2 host2
etcd-ca --depot-path=$ETCD_DEPOT_PATH sign host2
etcd-ca --depot-path=$ETCD_DEPOT_PATH new-cert --ip 192.168.0.1 host1
etcd-ca --depot-path=$ETCD_DEPOT_PATH sign host1

etcd-ca --depot-path=$ETCD_DEPOT_PATH export --insecure host2 | tar x
etcd-ca --depot-path=$ETCD_DEPOT_PATH export --insecure host1 | tar x

scp ca.crt host2.{host.crt,key.insecure}  host2.lan:/etc/etcd/
scp ca.crt host1.{host.crt,key.insecure} host1.lan:/etc/etcd/

etcd.conf@host1

ca_file = "/etc/etcd/ca.crt"
cert_file = "/etc/etcd/host1.host.crt"
key_file = "/etc/etcd/host1.key.insecure"
[peer]
...
ca_file = "/etc/etcd/ca.crt"
cert_file = "/etc/etcd/host1.host.crt"
key_file = "/etc/etcd/host1.key.insecure"

etcd.conf@host2

ca_file = "/etc/etcd/ca.crt"
cert_file = "/etc/etcd/host2.host.crt"
key_file = "/etc/etcd/host2.key.insecure"
peers = [192.168.0.1:7001]
[peer]
ca_file = "/etc/etcd/ca.crt"
cert_file = "/etc/etcd-ca/host2.host.crt"
key_file = "/etc/etcd/host2.key.insecure"

After starting etcd first on host1, and then on host2, I'm getting the follow logs.

host1:

Sep 07 10:49:47 host1 etcd[16213]: [etcd] Sep  7 10:49:47.033 INFO      | The path /var/lib/etcd/log is in btrfs
Sep 07 10:49:47 host1 etcd[16213]: [etcd] Sep  7 10:49:47.034 INFO      | Set NOCOW to path /var/lib/etcd/log succeeded
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.607 INFO      | host1 is starting a new cluster
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.611 INFO      | etcd server [name host1, listen on 0.0.0.0:4001, advertised url https://host1.lan:4001]
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.612 INFO      | peer server [name host1, listen on 0.0.0.0:7001, advertised url https://host1.lan:7001]
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.612 INFO      | host1 starting in peer mode
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.612 INFO      | host1: state changed from 'initialized' to 'follower'.
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.612 INFO      | host1: state changed from 'follower' to 'leader'.
Sep 07 10:49:50 host1 etcd[16213]: [etcd] Sep  7 10:49:50.612 INFO      | host1: leader changed from '' to 'host1'.
Sep 07 10:50:05 host1 etcd[16213]: [etcd] Sep  7 10:50:05.340 INFO      | host1: peer added: 'host2'
Sep 07 10:50:05 host1 etcd[16213]: [etcd] Sep  7 10:50:05.396 INFO      | host1: warning: heartbeat time out peer="host2" missed=1 backoff="2s"
Sep 07 10:50:07 host1 etcd[16213]: [etcd] Sep  7 10:50:07.442 INFO      | host1: warning: heartbeat time out peer="host2" missed=40 backoff="4s"
Sep 07 10:50:11 host1 etcd[16213]: [etcd] Sep  7 10:50:11.443 INFO      | host1: warning: heartbeat time out peer="host2" missed=119 backoff="8s"

host2:

Sep 07 12:50:02 host2 etcd[10864]: [etcd] Sep  7 12:50:02.255 INFO      | The path /var/lib/etcd/log is in btrfs
Sep 07 12:50:02 host2 etcd[10864]: [etcd] Sep  7 12:50:02.255 INFO      | Set NOCOW to path /var/lib/etcd/log succeeded
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.125 INFO      | Send Join Request to https://192.168.0.1:7001/join
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.368 INFO      | host2 joined the cluster via peer 192.168.0.1:7001
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.372 INFO      | etcd server [name host2, listen on 0.0.0.0:4001, advertised url https://host2.lan:4001]
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.373 INFO      | peer server [name host2, listen on 0.0.0.0:7001, advertised url https://host2.lan:7001]
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.373 INFO      | host2 starting in peer mode
Sep 07 12:50:05 host2 etcd[10864]: [etcd] Sep  7 12:50:05.373 INFO      | host2: state changed from 'initialized' to 'follower'.
Sep 07 12:50:05 host2 etcd[10864]: 2014/09/07 12:50:05 http: TLS handshake error from 192.168.0.1:35763: remote error: bad certificate
Sep 07 12:50:05 host2 etcd[10864]: 2014/09/07 12:50:05 http: TLS handshake error from 192.168.0.1:35764: remote error: bad certificate
Sep 07 12:50:05 host2 etcd[10864]: 2014/09/07 12:50:05 http: TLS handshake error from 192.168.0.1:35765: remote error: bad certificate

What am I doing wrong?

psi-4ward commented 9 years ago

Seems cloudinit dont support etcd_ca_file key

node01 coreos-cloudinit[576]: line 28: warning: unrecognized key "etcd_ca_file"
  fleet:
    etcd_servers: "https://123.123.123.123:4001"
    endpoint: "https://123.123.123.123:4001"
    etcd_ca_file: "/etcd-certs/ca.crt"
    etcd_certfile: "/etcd-certs/key.crt"
    etcd_keyfile: "/etcd-certs/key.key"
psi-4ward commented 9 years ago

Looks like i got it working without any update-ca-certificates hacks:

coreos:
  fleet:
    etcd_servers: https://1.1.1.1:4001
    endpoint: https://1.1.1.1:4001
    etcd_cafile: /etcd-certs/ca.crt
    etcd_certfile: /etcd-certs/key.crt
    etcd_keyfile: /etcd-certs/key.key

write_files:
  - path: /etc/profile.d/fleet-config.sh
    permissions: 0755
    content: |
      export FLEETCTL_ENDPOINT=https://1.1.1.1:4001
      export FLEETCTL_CA_FILE=/etcd-certs/ca.crt
      export FLEETCTL_CERT_FILE=/etcd-certs/key.crt
      export FLEETCTL_KEY_FILE=/etcd-certs/key.key
  - path: /run/systemd/system/etcd.service.d/30-certificates.conf
    permissions: 0644
    content: |
      [Service]
      # Client Env Vars
      Environment=ETCD_CA_FILE=/etcd-certs/ca.crt
      Environment=ETCD_CERT_FILE=/etcd-certs/key.crt
      Environment=ETCD_KEY_FILE=/etcd-certs/key.key
      # Peer Env Vars
      Environment=ETCD_PEER_CA_FILE=/etcd-certs/ca.crt
      Environment=ETCD_PEER_CERT_FILE=/etcd-certs/key.crt
      Environment=ETCD_PEER_KEY_FILE=/etcd-certs/key.key
  - path: /etcd-certs/ca.crt
    permissions: 0644
    owner: root
    content: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
  - path: /etcd-certs/key.key
    permissions: 0644
    owner: root
    content: |
      -----BEGIN RSA PRIVATE KEY-----
      ...
      -----END RSA PRIVATE KEY-----
  - path: /etcd-certs/key.crt
    permissions: 0644
    owner: root
    content: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----

/etc/profile.d/fleet-config.sh is for fleetctl