bitnami-labs / sealed-secrets

A Kubernetes controller and tool for one-way encrypted Secrets
Apache License 2.0
7.65k stars 684 forks source link

Decouple "current key" from "latest key" #359

Open mkmik opened 4 years ago

mkmik commented 4 years ago

The suggested backup workflow works a-ok in practice but suffers from a race condition. It also doesn't resonate with a lot of people (possibly because they intuit such a race condition or possibly just because or doc is bad).

The race condition is the following:

  1. An admin (or automation) backs up the current keys
  2. Key renewal period (30d) expires and a new key is created
  3. Another user seals a secret with kubeseal (or fetches a cert with kubeseal --fetch-cert)
  4. The cluster explodes.

If there is not enough time to run a backup between step 2 and 4, there might be one or more sealed secrets encrypted with a key that hasn't been backed up yet (and thus cannot be recovered after the disaster).

That's not usually a big problem if you run frequent backups (daily) or if you synchronize your backup schedule with the key renewal schedule, but the race condition is still there, albeit for a short amount of time.


Sealed secrets works with many keys, and while the youngest key is a bit special because it's the one used by default when sealing a new secret with kubeseal, you can seal new secrets by using any of the existing keys as well; that can happen since people can use the "offline" method and pass the certificate explicitly with --cert.

The latest (most recent) key is special because it's the most secure (the less likely to have been leaked in the past; simply because it's younger).

Our current default workflow does indeed prioritise security over disaster recoverability.

The subset of users who care about disaster recoverability are quite vocal, so I think we should do something to help them.

A common ask is control over sealing key renewal, but what I really think should happen is control over advertising the public key of the newly created key as "current" (and thus causing the possibility of encrypting an unrecoverable a secret until a backup is done).

A possible "almost zero knob" UX for this would be to add a delay before a new public key is advertised as current.

For example, imagine adding a default delay of renewal period / 2.

Let's draw a period of 10 days and a delay of 5 days:

key renew: 1---------2---------3---------4---------5---------
current:   -----1---------2---------3---------4---------5----

now, this means that as long as you run a periodic backup every 5 days, no matter what's the phase of your backup w.r.t the key renewal, every time a key becomes "current", you're guaranteed to have backed it u:

key renew: 1---------2---------3---------4---------5---------
current:   -----1---------2---------3---------4---------5----
b phase 1: 1----1----2----2----3----3----4----4----5----5----
b phase 2: -1----1----2----2----3----3----4----4----5----5---
b phase 3: --1----1----2----2----3----3----4----4----5----5--
b phase 4: ---1----1----2----2----3----3----4----4----5----5-
b phase 5: ----1----1----2----2----3----3----4----4----5----5

i.e. at any vertical slice of that diagram, any of the possible backup schedule "phases" already contains the current key number:

key renew: 1---------2---------3---------4---------5---------
current:   -----1---------2---------3<<v------4---------5----
b phase 1: 1----1----2----2----3----3<<v-4----4----5----5----
b phase 2: -1----1----2----2----3----3<v--4----4----5----5---
b phase 3: --1----1----2----2----3----3v---4----4----5----5--
b phase 4: ---1----1----2----2----3----x----4----4----5----5-
b phase 5: ----1----1----2----2----3<<</3----4----4----5----5
mkmik commented 4 years ago

CC: @jjo

jjo commented 4 years ago

@mkmik lifecycle LGTM, it would be also great to have the keys annotated with timestamps for its relevant events: