elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.57k stars 8.09k forks source link

[Fleet] Restrictive agent index policies break Reroute pipeline processor #181656

Open swg0101 opened 4 months ago

swg0101 commented 4 months ago

Kibana version: 8.13.2

Elasticsearch version: 8.13.2

Describe the bug: Currently, it appears that the Fleet API key permissions are derived directly from the integrations that are added to the policy. However, since the integration only allows a single "default" namespace to be specified, the resulting index permissions include only the -default suffix (or whatever name that was defined as the default) for the resulting namespace that would be created.

While this works without any modifications to the resulting pipeline, trying to partition your data into multiple namespaces would cause all of the logs to be silently dropped because the API key can only index to the default namespace, and not to any other namespaces within the same dataset.

Steps to reproduce: Use case: Delete noisy firewall deny events (e.g. Perimeter Internet noise) after 24 hours while keeping normal logs under the default ILM.

  1. Create agent policy, enroll agents, and add FTD integration (with the default namespace being used).
  2. Create ILM policy fw-denies with delete phase set to 24h.
  3. Clone the resulting logs-ciso_ftd.log index template into logs-cisco_ftd.log-denies with the same as the index pattern.
  4. Add index settings to the new index template: { "index": { "lifecycle": { "name": "fw-denies" } } }
  5. Create a new data stream under DevTools: PUT _data_stream/logs-cisco_ftd.log-denies
  6. Create a custom ingest pipeline logs-cisco_ftd.log@custom that is referenced by the managed integration: [ { "reroute": { "dataset": [ "cisco_ftd.log" ], "namespace": [ "denies" ], "if": "ctx?.event?.type?.contains('denied')" } } ]
  7. While the namespace is successfully updated, all matching logs are now dropped since the agent has no rights to the resulting namespace with the API key pushed to it, despite it being the same dataset. Omitting the dataset variable or using {{data_stream.dataset}} has no effect since it results in the same value.

Expected behavior: Since the system does not allow you to create integrations with the same name, it seems strange to me that the resulting API keys would only have access to a single namespace when the namespace functionality is designed to partition your data so that different settings or policies can be applied to it (e.g. ILM policies). I would expect that all namespaces under the same integration/dataset would be accessible to the API key, or at the very least, offer an additional list of namespaces that be can defined to grant additional access.

swg0101 commented 4 months ago

Not too sure if this issue is better suited in a different repo per se, although Kibana also has no options in the UI to modify the agent policy pushed to add additional indices, specify multiple namespaces to be used for an integration (e.g. using spaces, commas, or arrays), or allow the API keys to be edited since they are owned by the fleet server. I am guessing the only hackaround for this issue would be to create a dummy integration with a different dataset/integration name altogether although this seems like a very dirty workaround that would have its own set of issues with conflicting template policies that can mess up how data is indexed.

Feel free to transfer the issue elsewhere if it should belong elsewhere. Thanks.

elasticmachine commented 3 months ago

Pinging @elastic/fleet (Team:Fleet)

swg0101 commented 3 months ago

So played with this a bit more. Here's what I found:

Because of this constraint, the only way that I could find hacking around this issue is as follows (taking the Cisco FTD integration as an example):

On the logs-cisco_ftd.log-denies index template, use the following index settings:

{
  "index": {
    "lifecycle": {
      "name": "fw-denies"
    },
    "default_pipeline": "_none"
  }
}

This applies the proper ILM policy to the newly created indices and makes sure that the default pipeline isn't rerun again once the reroute action is completed. If the default_pipeline isn't set to "_none", then an error message will be populated in each rerouted event since the original message has already been renamed.

This seemed like a very hackish and labor-intensive way to make something simple work. I am hoping perhaps the integrations/Fleet team could figure out something to make this work a bit better.

kpollich commented 3 months ago

Thanks for the issue, @swg0101 - this is something that's come up a few times elsewhere in internal enhancement requests and we're tracking it for technical definition now.

The approach we're thinking about right now is to add a way for users to broaden their agent API key permissions in Fleet UI (or via the API). e.g. for agents on a given agent policy, agent API keys could be granted additional permissions to a set of index patterns. This would probably just be something like a free text input setting. This, along with public docs, feels like the quickest path to just unblocking folks who want to consume the reroute processor for their integration data.

The more "root cause" fix we're thinking of proposing to the ES team is to rethink permissions around the reroute processor entirely, e.g. adding a reroute processor to a given pipeline would implicitly grant write permissions to the destination index to any request that triggers the processor. This is a bit less fleshed out, but it's something we think is worth bringing to the ES team.

Curious for your thoughts either way!

swg0101 commented 3 months ago

@kpollich - Thanks for the reply!

My thoughts on this are that there does not appear to be a consensus on what the reroute processor ought to be for. Considering:

  1. That the documentation calls out explicitly: Note that the client needs to have permissions to the final target. Otherwise, the document will be rejected with a security exception which looks like this:, and:
  2. The original pipeline is skipped once a reroute action is performed, and the destination pipeline runs.

It may seem that the original intent of the processor was to reroute incoming documents to different pipelines, which may be useful when you are sharing one UDP Syslog collector but have multiple log types coming in that require different processing pipelines. In that case, the permission model seemed appropriate because you want to ensure that the writer does have sufficient permissions to access the destination index. However, this model would only work when you are creating the keys yourself and planning out the permissions manually.

The other use case involves partitioning, which is the use case I am going after (e.g. https://www.elastic.co/blog/simplifying-log-data-management-flexible-routing-elastic).

For both these use cases, delegating the proper permissions can be a problem, especially under the current Fleet setup where it isn't very flexible regarding what's being granted. To get around that, I see two possibilities:

  1. As you suggested, allow a freeform text box that allows the user to grant an arbitrary index pattern under a particular agent policy. This is probably needed under the original reroute intent where the data is likely to be routed to a completely different data stream or index pattern altogether.
  2. Add a freeform box under the integrations themselves to allow more namespaces to be specified. This would be a simpler method for the partition use case, where the reroute destination is likely to remain within the same dataset but rerouted to a different namespace.

In the latter use case, however, things can be a challenge with Enterprise Cloud environments where integrations are usually managed with pipelines predefined by the integration itself. For example, adding additional processors in this circumstance usually involves editing the @custom pipeline which is included at the tail end of a pipeline.

Since the reroute was added to the tail end, it would mean that after the reroute processor runs, the destination pipeline will run once more. In the previous example FTD example I gave, it would mean that the document runs through the default FTD pipeline, finishes, runs the custom pipeline, gets rerouted to the destination namespace, gets matched against the FTD pipeline again, and rerun the default FTD pipeline once more with an error (because the original message fields have already been manipulated out of the document).

It would be nice if there is a processor or an option(s) that allows one to fulfill this use case (e.g. in-place rename), where the document is shuffled from one index/datastream to another with no additional processing, with:

  1. No additional pipelines being run, either at the source or the destination (since the reroutes are added at the bottom of the @custom pipeline and the document at this point is fully processed). Right now I am just hacking this by setting the new index template to include a default_pipeline of _none.
  2. Copies the mapping information from the current document to the new index if the index does not have the fields mapped so one does not need to include various component templates to copy the settings from the existing integration. This reduces the likelihood of user errors and management overhead, so components including ECS templates and other things don't need to be explicitly included in the new index template.
  3. Implicit write permission to the new destination.

I have tried simply setting _index to the generic datastream name, but it does not seem to work properly when compared to using the reroute processor with the hacks I have mentioned.

In terms of the implicit grant, I do like the idea, since it would make this point moot, although I don't know if any security considerations need to be considered otherwise.