MatrixAI / Emergence

Distributed Infrastructure Orchestration
Apache License 2.0
1 stars 0 forks source link

TLS Certificate Centralisation #42

Open CMCDragonkai opened 5 years ago

CMCDragonkai commented 5 years ago

It can be very annoying that each Automaton's image may have its own TLS certificates, and a subset of all automatons may have different CAcert versions resulting in weird TLS certificate verification failures. This seems like a problem that should be solved at the "build" time phase. However the problem is this forces each Automaton to be rebuilt with a common cacert version everytime it updates. It makes sense that CA certs should be centralised and instead updated in one go separately from everything else. But how do such a thing? This seems like a problem that our volume system should help with. Because apps tend to expect CAcerts as physical files on a filesystem.

Note that this is about client certificate authority, not server certificates, nor is it about "client certificates" which is a separate problem.

CMCDragonkai commented 5 years ago

See the curl program and how it relies on CA location as command line parameters or env variable pointing to a local filesystem path. If we abstract this, we also have to be careful about how processes re-read these files when performing external HTTPS calls.

CMCDragonkai commented 5 years ago

Fragmented behaviour is avoided by relying on system-provided trust database. In the context of containers, however this is again fragmented because each container is an independent system. This means to unify this behaviour again, we need to have a shared certificate database. However beaware of certificate database caching which can occur (for example Python's SSL context). This means even if you change the certificate database, any automaton relying on this shared database must also be restarted. Of course there are more lightweight interrupts that could be used to tell the application to reload their own cache, but this requires a signalling interface for the orchestrator to pass out-of-band instructions to each Automaton.

For more information see:

Note the usage of SSL_CERT_FILE and SSL_CERT_DIR as by-convention universal variables for setting the location of the certificates. The ultimate fallback is what the TLS libraries were compiled with (see their configure in their build instructions).

Here's a demonstration of application-level caching of TLS certificates.

import ssl
s = ssl.SSLContext()
s.load_default_certs()
s.get_ca_certs()

Test it with: strace python ./test.py |& grep open | less

You'll see that it tries to open the path to the certificates at the very end.

In NixOS, python is compiled with ssl support using openssl. Without any further directives, it will fallback on whatever openssl is compiled to look at. On NixOS, openssl looks for system specified trust database located at /etc/ssl/certs (which is set at compile time). Most libraries rely on this, and most libraries then allow for SSL_CERT_FILE or SSL_CERT_DIR env variables to be used.

CMCDragonkai commented 5 years ago

Note that TLS certificate usage would only really be used for communication outside from a Matrix network. Within a Matrix network, using HTTPS connections between Automatons adds an extra unnecessary layer of end-to-end encryption that occurs point-to-point between caller automaton and called automaton. That being said, many applications may be built that way. However doing such a thing is inadvisable, and instead TLS certificates should be self-managed for internal services. But this may be left to operator to decide.

CMCDragonkai commented 5 years ago

Using Nix, for anything where you need external HTTPS certificates, you need the cacert package in. It has a setup hook that brings in the SSL_CERT_FILE variable, that needs to be set for any exported programs. Then cacert package is a propagatedBuildInput probably?


No you just need it as a buildInput that you then capture as a wrapped output such as like:

https://github.com/NixOS/nixpkgs/blob/4d1abc44199c8957105f538119c2d19d67aee26f/pkgs/development/compilers/rust/cargo.nix#L41-L50

Doing so, allows you convert that which is notated as a compile time dependency to be something that must exist at runtime as well. Propagated build inputs just mean that your compile time dependency must exist when a dependent package is depending on you.

CMCDragonkai commented 5 years ago

https://github.com/NixOS/nixpkgs/issues/8247