elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.72k stars 8.13k forks source link

Detect and prevent the use of mismatched encryption keys #92654

Open legrego opened 3 years ago

legrego commented 3 years ago

Kibana relies on a number of encryption keys. Arguably the most important key is xpack.encryptedSavedObjects.encryptionKey, as this controls the encryption/decryption of actions, alerts, and other sensitive user data.

Kibana requires that this key is set to the same value across all instances. If two Kibana instances have different encryption keys, then they will be encrypting saved objects that cannot be decrypted by the other instance.

We should attempt to detect if there is a potential encryption key mismatch, and alert consumers of the ESO plugin so that they can take appropriate action.

One potential solution is to save a "canary" saved object, whose sole purpose is to test that it can be successfully decrypted by the current instance. If we cannot decrypt this object, then it stands to reason that this instance is not properly configured. I expect there are scenarios where we'd need to allow the canary to be forcefully replaced, however.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-security (Team:Security)

azasypkin commented 3 years ago

I expect there are scenarios where we'd need to allow the canary to be forcefully replaced, however.

The most obvious scenario is when you just lose your encryption key or you know it has been compromised and you cannot trust your already encrypted objects anymore. In this case it'd be reasonable to assume that you'd just change your key and not put the old one to the decryptionOnlyKeys collection.

Introducing some additional configuration key or using CLI to support this specific use case can make our entire encryption story even more convoluted.

Sorry mostly thinking aloud, don't have a good proposal yet. I believe the consumers already have the tool to detect this scenario and can act properly (aka analyzing error(s) ESO returns), so we have a bit of time to think here.

pmuellr commented 3 years ago

The only detection we have in alerting is that we get an error decrypting the alerting ESOs. Which could be some other problem other than mismatched encryption keys - but it does certainly occur when encountering mismatched keys. So we're left wondering - is it mismatched keys, or alerting? The customer never believes it's mismatched keys :-)

I do think we need some kind of canary. Kibana creates a unique server UUID every time it starts, so in theory we could have each Kibana write an ESO when it starts, with it's server UUID and an ESO with a fixed field. We could then lazily check at startup if we can read all the "recent" ESOs that have been written, or something. And we'd need to "garbage collect" old ones. Probably each Kibana would need to update it's ESO every hour or something.

Ideally I'd like to see some kind of message logged when a mismatched key is detected, but it should probably be even more obvious, in the UI.

azasypkin commented 3 years ago

Which could be some other problem other than mismatched encryption keys - but it does certainly occur when encountering mismatched keys.

If you get a decryption error (it has a specific EncryptionError type) that means that either encryption key is different or AAD doesn't match, no other reasons should cause EncryptionError. Do you need to distinguish and handle these two cases differently?

I assumed that mismatched AAD is something that should never happen if objects are manipulated with the alerting APIs/UI and if it happens somehow than it should be treated as a critical issue - object may have been tampered with or something else is broken badly and leads to a data loss - either of this requires immediate attention. Or is there a legitimate use case in Alerting that can lead to a mismatched AAD?

jportner commented 2 years ago

Related: #95339, #113928

We have other problems with multiple Kibana instances that do not have other options in sync (security encryption key, reporting encryption key, auth providers) and/or different versions of Kibana.

jportner commented 2 years ago

I met today with @legrego and @azasypkin to discuss this topic.

Our takeaway is that this problem is bigger than just detecting incorrect ESO encryption keys, but "canary objects" are still likely our best way to solve this problem. We'd like to build something with these characteristics:

Since we want to take multiple Kibanas into account, we think that using multiple canary objects is our best bet.

Each Kibana does have a server.uuid but it might change on startup, and users can configure this value (so it's not reliably unique), so identifying a single Kibana instance is not trivial. When Kibana starts we could generate a uuid to keep in memory and compare it to canary objects -- we can't definitively say if an object originated from this Kibana, but we can definitively say if it didn't. That should be enough for our purposes.

If the canary object informs us that something is wrong, we can surface this to operators in the server logs. We could also expose this information in a status endpoint. Eventually we could build a UI around it, too.

First, though: this design could get complicated and it is definitely breaking new ground. Because of that, and because we think this is a Core concern, we should make an RFC before moving forward with any implementation.

lukeelmers commented 2 years ago

this design could get complicated and it is definitely breaking new ground. Because of that, and because we think this is a Core concern, we should make an RFC before moving forward with any implementation.

100% agree with this.

Also, here's a related reporting issue having to do with mismatched configs: https://github.com/elastic/kibana/issues/120995


cc @elastic/kibana-core @stacey-gammon

pgayvallet commented 2 months ago

@legrego / @jeramysoucy can this issue be considered part of the "Core encryption" initiative, and if so, can I close it as superseded?

jeramysoucy commented 2 months ago

@pgayvallet Yes, or we could add this issue as a sub-issue. I'll bring it up at our sync this week.