bevry-labs / terraform-scaleway-hashistack

Terraform module to deploy Consul, Nomad, Vault onto Scaleway
https://registry.terraform.io/modules/bevry/hashistack/scaleway
22 stars 3 forks source link

tls certificate issuer and updater #12

Open balupton opened 6 years ago

balupton commented 6 years ago

Summary:

Generating the nomad certs on origin did not work as the nomad machines would then have certs which did not include their private_ips in the certs ip_sans, which would cause the cert to be rejected from the local instance.

I then tried to generate the nomad certs on the nomad machines. That fixes the ip_sans issue, but then prevents nomad to nomad communication as each nomad instance then has different certs.

Solving this seems to require a certificate issuer and update service.

Possible Solutions:

Local polling + local issuance:

  1. Create a poll service on each machine that polls a vault secret (that contains issued pki combo json) every 30 seconds, if there is a change, then reconfigure the local nomad service.
  2. When a new nomad service is required, append another vault secret with the new private_ip, then generate a new pki combo with all the private_ips from the earlier secret, put that combo json into the secret at step 1.
  3. To setup the vault secrets, vault policies and tokens would need to be created for the polling and writing requirement. Or just use the cluster_token in memory.

Developer issuance in pre:

  1. For each new server that was just issued but not yet configured, the terraform script then remotes into existing services and updates their TLS cert to include the new server's private_ip.
  2. Generation of the PKI bundle could occur locally or on origin, then propagated.

Developer issuance in post:

  1. All services have TLS off at the start
  2. Then once all servers are deployed and running, remote into origin, generate the certs containing all their private_ips, then remote into each server and inject the cert, and reconfigure their services.

Abandon local TLS entirely for Cloudflare Argo Tunnel:

  1. Cloudflare Argo Tunnel only allows connections from cloudflare servers and users you give access to via Cloudflare Access. Argo Tunnel also encrypted all traffic by generating a local certificate on the machine that then interfaces with the Cloudflare endpoint. Accomplished by #8

Assessment:

Local polling allows short TTL on local TLS. Accomplishes #4

Local polling AND dev issuance in pre, would involve reloading for all existing servers, when each new server added.

Dev issuance in post, would involve reloading for all servers, but only once in post.

Reloading may induce downtime if not timed to be simultaneous.

Conclusion:

Argo Tunnel should be explored. It could turn out to be easiest and most secure. And may turn out to be able to be used with service TLS.

At a later point, implement service TLS. It would require 1-3 weeks by estimate to get the options for it going.

balupton commented 6 years ago

If I install a nomad agent on origin and the masters, then I could use nomad jobs to:

  1. run the poller
  2. generate the certs on the appropriate hosts (perhaps consul will give the ips needed then)
balupton commented 6 years ago

Seems https://github.com/hashicorp/consul-template/blob/master/README.md is the official answer, even includes a vault cert gen example

balupton commented 6 years ago

Two recent progressions to make this easier.

Progression One

Generating the nomad certs on origin did not work as the nomad machines would then have certs which did not include their private_ips in the certs ip_sans, which would cause the cert to be rejected from the local instance.

As vault 0.10.3 supports

URI SANs in PKI: You can now configure URI Subject Alternate Names in the pki backend. Roles can limit which SANs are allowed via globbing.

Found via https://www.vaultproject.io/api/secret/pki/index.html#uri_sans-1 and https://www.vaultproject.io/api/auth/cert/index.html#allowed_uri_sans

Then perhaps this issue can now be worked around, rather than implementing Consul Template.

Consul Template does offer the advantage of short lived certificates that can update on the fly, but at the expense of a lot more complexity.

Progression Two

Consul 1.2 introduces a new feature called Consul Connect, which automatically provides TLS for Consul Services (not consul, vault, and nomad themselves).

However, in the docs for its various features, it includes these hints:

https://www.consul.io/docs/guides/connect-production.html

Configure Agent Transport Encryption

Consul's gossip (UDP) and RPC (TCP) communications need to be encrypted otherwise attackers may be able to see ACL tokens while in flight between the server and client agents (RPC) or between client agent and application (HTTP). Certificate private keys never leave the host they are used on but are delivered to the application or proxy over local HTTP so local agent traffic should be encrypted where potentially untrusted parties might be able to observe localhost agent API traffic.

Follow the encryption documentation to ensure both gossip encryption and RPC/HTTP TLS are configured securely.

For now client and server TLS certificates are still managed by manual configuration. In the future we plan to automate more of that with the same mechanisms Connect offers to user applications.

https://www.consul.io/docs/connect/platform/nomad.html

Connect on Nomad

Connect can be used with Nomad to provide secure service-to-service communication between Nomad jobs and task groups. The ability to use the dynamic port feature of Nomad makes Connect particularly easy to use.

Using Connect with Nomad today requires manually specifying the Connect sidecar proxy and managing intentions directly via Consul (outside of Nomad). The Consul and Nomad teams are working together towards a more automatic and unified solution in an upcoming Nomad release.

Which hopefully means that HashiCorp are working on a way to make TLS automatic, not just for Consul services which Consul Connect already supports, but also for the HashiSuite itself.

Relevant links:

Conclusion

With these developments, then it seems that a combination of

  1. Consul Connect (for TLS on services/apps)
  2. and; Long-lived certificates that have URI SANs (for TLS on HashiSuite)

Should be the missing pieces for a TLS enabled cluster with minimum complexity for the current day.

If option (2) proves to not work, then Consul Template will be required for this use case. However, Consul Template for that use case has limited life expectancy, as it seems HashiCorp are working to provide a automated and builtin alternative. As such, if (2) fails, then the options are:

  1. Implement Consul Template for HashiSuite encryption
  2. Wait for HashiCorp to provide their updates to their suite

If we do Consul Template, then in a few months (or years) later, we would end up having to upgrade to the automated updates anyway, moving away from Consul Template. As such, my thinking is if the URI SANs option fails, then just proceed without HashiSuite TLS encryption in the meantime until the updates to occur.

ghost commented 6 years ago

Is the intent to be able to run the equivalent of Argo Tunnels over TLS due to the fact that Ip addresses are not static ?

Your solution is something i have also been thinking about because i am in a non static IP address environment. The new Consul Connect looks interesting. Think a hacky setup should be tried to see if its works

balupton commented 5 years ago

Seems Hasicorp is finally working to make this easier.

These would be essential reading for anyone who wants to continue this work.

balupton commented 5 years ago

There is now consul connect support in nomad 0.10 which seems to also assist with this

https://www.hashicorp.com/blog/consul-connect-integration-in-hashicorp-nomad/

https://www.consul.io/docs/connect/index.html

https://www.consul.io/docs/connect/ca/vault.html