Open nicktrav opened 2 years ago
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Is your feature request related to a problem? Please describe.
Currently, when using encryption-at-rest (EAR), the store key must be accessible to Cockroach at process start time. This is typically achieved by having the key reside on filesystem that is "local" to the node - the key is typically persistent, and remains accessible, even after the process has booted.
A downside to this approach is that a compromise of the host can result in a compromise of the store key, if the attacker has access to the filesystem on which the key resides.
Describe the solution you'd like
When deploying EAR, a Key Management System (KMS) is typically leveraged as the source for the store key. The intention with using the KMS is that key material never has to reside on the filesystem. Instead, it is pulled from the KMS over an encrypted and authenticated network connection.
Consider allowing Cockroach to pull its store key from the KMS at process boot time.
This could look something like passing a "URL" for the key to pull from the provider (HTTP endpoint, cloud resource name, etc.) as configuration parameter to Cockroach. On process start, Cockroach would make the request to the KMS (using the requisite service-account credentials to authenticate, typically loaded by the SDK corresponding to the platform on which Cockroach is deployed). The key material would only ever reside in process memory (it could also be "shielded" using something like
memguard
to prevent it from inadvertently making it to disk in heap dumps, swapping, etc., and ensuring that rotated key material is properly garbage collected during the process lifetime).One downside is that this introduces an additional failure vector at process start time - if the KMS is unavailable, the process will not start. The tradeoff for removing the key material from disk seems worth it. It is also likely that this coupling exists already is deployments in which a KMS is employed, as the store key must be first fetched and persisted to disk before starting Cockroach. This process is handled external to Cockroach - this feature would move that operation into Cockroach itself.
Jira issue: CRDB-15146
Epic CRDB-16419