elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

Request body auditing for selected request types #64234

Open albertzaharovits opened 4 years ago

albertzaharovits commented 4 years ago

xpack.security.audit.logfile.events.emit_request_body is used to toggle REST request body auditing. This is a coarse control. I think it makes sense to be able specify auditing the body for certain requests but not others, eg only for searches but not for indexing.

elasticmachine commented 4 years ago

Pinging @elastic/es-security (:Security/Audit)

bytebilly commented 4 years ago

This is absolutely something that would help customers to tune their audit trails avoiding too much data to be logged.

I think that the challenging part is to define which are the "types" that we should use to classify our events. Most of the time, customers have to turn the general setting on because they are looking at detailed information that they cannot get without the full request, for example which are the parameters when doing a search, or which are the new values when changing settings.

If we would be able to provide a better experience for each individual action, they probably don't need the entire body anymore, except a very few cases where their compliance policy requires it. It would also be preferable, since the current body is exposed as a JSON-encoded string and it's not easy to manipulate or consume.

Do we already know how much effort it would require to introduce some filter on events that should emit the body? Does it make sense to include more details into the "default" information for those instead?

jmac-met commented 3 years ago

As a user of this feature, what we are after is understanding who has searched what indexes and what they were searching for. So for example we can trace that Jane User successfully searched for Guybrush Threepwood across index1, index2 and index3. We would also like to see that Jane User tried to search for poor Guybrush across index4 and index5, but was denied access. At the moment we have configured emit body to capture what Jane User was searching for (only really works on one cluster), but as a side effect we also get the results of Jane's search which can span multiple megabytes in a single document, causing Kibana to error and not show anything with the default settings. This isn't desirable behaviour for 3 reasons:

  1. We'd like to keep audit logs for a long period of time, but with some of the data being duplicated due to user activity, the costs of storage are unpredictable and rapidly skyrocketing which is an Achilles heel for the audit function (especially in cloud when trying to predict costs).
  2. Although Audit data is sensitive by nature, capturing the search results in another cluster means we now have an unnecessary secondary storage and processing headache to work through with the data owners.
  3. Users only see the errors, giving a false impression of stability and usability.

Hope this helps a little.

ywangd commented 3 years ago

but as a side effect we also get the results of Jane's search which can span multiple megabytes in a single document

capturing the search results in another cluster means we now have an unnecessary secondary storage ...

Could you please clarify what do you mean by "results" and "search results"? The xpack.security.audit.logfile.events.emit_request_body is to enable only the request body, not the response. How did the results get captured? Thanks!

jmac-met commented 3 years ago

sample_redacted.txt Sorry for the slow reply. I've attached a heavily redacted version of one of the monster entries from the monitoring cluster. This is capturing a copy of the data from the main cluster as part of the entry. Twice. I've obviously pruned it right back as the thought of going through a 2.5MB document to check I have redacted all the returned data was more than I could take.

jmac-met commented 3 years ago

Actually looking over it again, it's at least partly tracking what the data_writer is writing to the indexes. Hopefully not everything, but I haven't the will to plough through all the data being written to the primary cluster.

ywangd commented 3 years ago

Thanks @jmac-met

The large chunk of messages in the log file are request body, specifically a bulk indexing request body. That is why they are so big. The are not search results. So no concern here. But I see your point and we are aware that these large bulk indexing request bodies can become unwieldy, which is what this issue talks about: request body for "only for searches but not for indexing".