dapr / cli

Command-line tools for Dapr.
Apache License 2.0
321 stars 203 forks source link

mTLS cert expiry notification and upgrade needs to be improved for production clusters to prevent outages #807

Open msfussell opened 3 years ago

msfussell commented 3 years ago

Issue

Currently Dapr makes is very easy to create a root and issuing certs when the control plane is first installed into a hosting environment, such as cluster (for example kubernetes). The sentry service is used to generate and distribute certs to the side-cars and provide mTLS capabilities, which are used for service invocation. This is an important Dapr capability for operators. See https://docs.dapr.io/operations/security/mtls/

However the root cert generated by the Sentry service has a 1 year expiry date from generation and this means that, once a cluster has been running in production for a year the certs need to be updated. As described in the above article this is left as a manual process currently, with the operator needing to generate new certs and manually copy these into a K8s secret. This is hard, awkward and error prone. In order to prevent issues in production cluster this process of cert upgrades needs to be improved. Given the v1.0 anniversary of Dapr v1.0 is Feb 2022, we would start to see many production clusters that have been running for 1 year start to hit this issue. Generally in distributed applications certificate expiry is one of the most common issues to affect long running clusters and causes incidents to be reported and application downtime. This needs to be prevented.

Describe the proposal

Improving the process for certification upgrade can be achieved with the following;

1) Have a notification event/metric event from the Dapr Sentry service raised to the operator every hour once the root expiry certificate is less than 30 days away from expiration. Currently the expiry date for the cert can be found with

dapr mtls expiry -k

but this is a manual process and automated warnings should also be surfaced in logs.

2) The mtls command needs to have upgrade and import options to ensure successful certs rollovers. Ideally the operator should be able to run the following CLI command to simply upgrade the root and issue certs in the cluster with a new expiry date (say another 1 or 2 years from the current date)

dapr mtls upgrade --kubernetes --expirydate <date>

The goal of this command is to simplified the upgrade process for soon to be expiring CA and Issuer certs for an operator and prevent application downtime.

This command, generates a new root and issuer issue certificate with the specified future date alongside the existing root and issuer certificate. Ideally a push based approach would push the new cert to each of the dapr sidecars, however instead a rolling upgrade of the dapr deployments may be needed by the operator to pick up the new cert. The old cert can be left in place and cleaned up next time the upgrade command is run, so that there are only ever two (previous and current) certs installed into the environment. Alternatively when updating the root certificate if it uses the same private key then the Dapr instance would not then need to be restarted. This may be a configuration option to consider.

3) For those wanting to provide their own certs, both initially and during upgrade, the current manual process of editing the K8s secrets (or any other future hosting environment) is open to mistakes. For example having to use Helm as another tool. Having a specific CLI import command to install the certificates correctly hides this complexity and improves certification update. Ideally the operator should be able to run the following CLI command

dapr mtls import --kubernetes --certificateAuthorityRootCertificate <ca.crt> --issuerPrivateKey<issuer.key> --issuerPublicCertificate<issuer.crt>

(note: This should be used instead of the K8s Helm command, in the same way that dapr init hides the use of Helm for dapr installation as the recommended way to install dapr )

4) Make it easy to generate new certs with the correct format that can be imported, by wrapping the openssl command with a CLI command. This is particularly useful for certs being used for local self-host testing.

dapr mtls createcertificates --path <folder to write to, otherwise use current folder>

The output of this command can be used by the dapr mtls import -k command

Optional. Improve the local mTLS usage experience.

5) Using the Sentry service on a local installation could also be improved with the above CLI commands. Ideally there could also be a CLI command to start the Sentry service locally using generated certificates placed into a known folder. In other words improving the multiple steps here. https://docs.dapr.io/operations/security/mtls/#self-hosted, although this is not as important as the upgrade command above. Something like this;

dapr init --enable-mtls - also starts and configures the Sentry service locally (today this does nothing) and generates certs using the dapr mtls createcertificates command into a known folder dapr mtls can be now run without having to use the -k option to see the Sentry service running local and the mtls is enabled in self hosted mode

Also see this related issue on the local Sentry service https://github.com/dapr/cli/issues/550

dapr-bot commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

rabollin commented 2 years ago

@pravinpushkar - As discussed please assign this task to yourself.

@msfussell , @artursouza - Incase of self signed certificates is still the recommendation to go with 1 yrs validity? Azure guidance was to have 90 day validity. Given this is a opensource and to not create much pain for the customers, we can generate 180 day validity certificate from now onwards.

For internal services like Container Apps/ AKS should abide to Azure standards and the recommendation is to use valid CA authorized certificates. So while we consider the self signed certs. for OSS, let us add CA signed certificates for internal version.

berndverst commented 2 years ago

In the case where Dapr CLI (or Sentry) generates the certificates we must find a way to also allow exporting of the private key used for generating the certificates.

This is because it is not possible to replace certificates without downtime when mTLS is used (and specifically in the case of service invocation) unless the new certificates are signed with the same private key.

Therefore we need the following:

Additionally, to successfully roll certificates the following must be done:

  1. Update the Dapr Trust Bundle Secret
  2. Restart the Dapr Sentry Service
  3. Restart the rest of the Dapr Control Plane
  4. Restart all application deployments

I recommend that we make it easy to do 1-3 but we do not perform (4) on behalf of the user, instead providing the relevant information how this can be accomplished.

Note that all services must be restarted before the existing certificates expire.

berndverst commented 2 years ago

Here are the items we need:

Warn users of mTLS certificate expiration:

Sidecar / Runtime / Daprd:

CLI

Improved certificate creation and update

Downtime impact mitigation

berndverst commented 2 years ago

@msfussell FYI I find it counter intuitive for the command to be dapr mtls upgrade because the certificates need to be upgraded even when mtls is set to false! This is beacuse the Dapr control plane always uses the certificates.

I'm not sure whether this is the right approach, but I would suggest dapr upgrade --renew-certificates or something like that. See the separate issues I opened for that.

pravinpushkar commented 2 years ago

/assign

pravinpushkar commented 2 years ago

I am assigning this to myself. Please feel free to reassign if anybody has already started working.

mukundansundar commented 2 years ago

moving the epic to next milestone ...

lukeschlather commented 2 years ago

I'm confused by the assertion that Dapr needs to have a single key to work without downtime. Why can't Dapr simply support multiple root certificates?

pravinpushkar commented 2 years ago

Since this is marked as Epic so untagging it from a release. We can target specific issues for releases.