Open legrego opened 3 years ago
Pinging @elastic/kibana-security (Team:Security)
I expect there are scenarios where we'd need to allow the canary to be forcefully replaced, however.
The most obvious scenario is when you just lose your encryption key or you know it has been compromised and you cannot trust your already encrypted objects anymore. In this case it'd be reasonable to assume that you'd just change your key and not put the old one to the decryptionOnlyKeys
collection.
Introducing some additional configuration key or using CLI to support this specific use case can make our entire encryption story even more convoluted.
Sorry mostly thinking aloud, don't have a good proposal yet. I believe the consumers already have the tool to detect this scenario and can act properly (aka analyzing error(s) ESO returns), so we have a bit of time to think here.
The only detection we have in alerting is that we get an error decrypting the alerting ESOs. Which could be some other problem other than mismatched encryption keys - but it does certainly occur when encountering mismatched keys. So we're left wondering - is it mismatched keys, or alerting? The customer never believes it's mismatched keys :-)
I do think we need some kind of canary. Kibana creates a unique server UUID every time it starts, so in theory we could have each Kibana write an ESO when it starts, with it's server UUID and an ESO with a fixed field. We could then lazily check at startup if we can read all the "recent" ESOs that have been written, or something. And we'd need to "garbage collect" old ones. Probably each Kibana would need to update it's ESO every hour or something.
Ideally I'd like to see some kind of message logged when a mismatched key is detected, but it should probably be even more obvious, in the UI.
Which could be some other problem other than mismatched encryption keys - but it does certainly occur when encountering mismatched keys.
If you get a decryption error (it has a specific EncryptionError
type) that means that either encryption key is different or AAD doesn't match, no other reasons should cause EncryptionError
. Do you need to distinguish and handle these two cases differently?
I assumed that mismatched AAD is something that should never happen if objects are manipulated with the alerting APIs/UI and if it happens somehow than it should be treated as a critical issue - object may have been tampered with or something else is broken badly and leads to a data loss - either of this requires immediate attention. Or is there a legitimate use case in Alerting that can lead to a mismatched AAD?
Related: #95339, #113928
We have other problems with multiple Kibana instances that do not have other options in sync (security encryption key, reporting encryption key, auth providers) and/or different versions of Kibana.
I met today with @legrego and @azasypkin to discuss this topic.
Our takeaway is that this problem is bigger than just detecting incorrect ESO encryption keys, but "canary objects" are still likely our best way to solve this problem. We'd like to build something with these characteristics:
Since we want to take multiple Kibanas into account, we think that using multiple canary objects is our best bet.
Each Kibana does have a server.uuid
but it might change on startup, and users can configure this value (so it's not reliably unique), so identifying a single Kibana instance is not trivial. When Kibana starts we could generate a uuid to keep in memory and compare it to canary objects -- we can't definitively say if an object originated from this Kibana, but we can definitively say if it didn't. That should be enough for our purposes.
If the canary object informs us that something is wrong, we can surface this to operators in the server logs. We could also expose this information in a status endpoint. Eventually we could build a UI around it, too.
First, though: this design could get complicated and it is definitely breaking new ground. Because of that, and because we think this is a Core concern, we should make an RFC before moving forward with any implementation.
this design could get complicated and it is definitely breaking new ground. Because of that, and because we think this is a Core concern, we should make an RFC before moving forward with any implementation.
100% agree with this.
Also, here's a related reporting issue having to do with mismatched configs: https://github.com/elastic/kibana/issues/120995
cc @elastic/kibana-core @stacey-gammon
@legrego / @jeramysoucy can this issue be considered part of the "Core encryption" initiative, and if so, can I close it as superseded?
@pgayvallet Yes, or we could add this issue as a sub-issue. I'll bring it up at our sync this week.
Kibana relies on a number of encryption keys. Arguably the most important key is
xpack.encryptedSavedObjects.encryptionKey
, as this controls the encryption/decryption of actions, alerts, and other sensitive user data.Kibana requires that this key is set to the same value across all instances. If two Kibana instances have different encryption keys, then they will be encrypting saved objects that cannot be decrypted by the other instance.
We should attempt to detect if there is a potential encryption key mismatch, and alert consumers of the ESO plugin so that they can take appropriate action.
One potential solution is to save a "canary" saved object, whose sole purpose is to test that it can be successfully decrypted by the current instance. If we cannot decrypt this object, then it stands to reason that this instance is not properly configured. I expect there are scenarios where we'd need to allow the canary to be forcefully replaced, however.