elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.76k stars 8.16k forks source link

[Fleet] Support for document-based routing via ingest pipelines #151898

Closed joshdover closed 1 year ago

joshdover commented 1 year ago

We are moving forward with a solution for "document-based routing" based a new ingest pipeline processor, called the reroute processor. Fleet will be responsible for managing routing rules defined by packages and the end user, and updating ingest pipelines to include the new data_stream_router processor to apply the rules.

Links

Overview diagram

image

Integration-defined routing rules

Integrations will be able to define routing rules about how data from other integrations or data streams should be routed to their own data stream. For example, the nginx package may define a routing rule for the logs-kubernetes.container_logs data stream to route logs to logs-nginx.access whenever container.image.name == "nginx". Similarly, when the kubernetes package is installed and the nginx was also previously installed, we'll need to ensure the logs-kubernetes.router-{version} ingest pipeline includes a reroute processor for each routing rule defined on the nginx integration.

To support this, we'll need to add a concept of routing rules to the package spec and add support for them in Fleet.

# Integration-defined routing rules
- [ ] https://github.com/elastic/package-spec/issues/514
- [ ] https://github.com/elastic/kibana/issues/155910
- [ ] https://github.com/elastic/kibana/issues/157422
- [ ] https://github.com/elastic/ingest-docs/issues/276

Supporting tasks

We'll need to do a few things in support of these changes as well, namely around API key permissions.

# Supporting tasks
- [ ] https://github.com/elastic/kibana/issues/134971
- [ ] https://github.com/elastic/package-spec/issues/315

cc @ruflin @felixbarny

ruflin commented 1 year ago

I was suggesting that we might get by with just the "injected" rules if we define them in the manifest of the "leaf" data streams instead of the package.

The idea is that all the definitions are in the dataset / data stream. My understanding is we need both as otherwise we have no way for community packages to extend the routing rules.

kpollich commented 1 year ago

I've taken the example manifest provided above and converted it to use inject-only routing rules, e.g.

# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs

# This is a catch-all "sink" data stream that routes documents to 
# other datasets based on conditions or variables
dataset: nginx

# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true

injected_routing_rules:
  # Route K8's container logs to this catch-all dataset for further routing
  k8s.router: 
    - dataset: nginx # Note: this _always_ has to be the current dataset - maybe we can infer this?
      if: "ctx?.container?.image?.name == 'nginx'"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

  # Route syslog entries tagged with nginx to this catch-all dataset
  syslog:
    - dataset: nginx
      if: "ctx?.tags?.contains('nginx')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

---

# nginx/data_stream/access.yml
title: Nginx access logs
type: logs

injected_routing_rules:
  # Inject a routing rule into the nginx dataset's pipeline that routes access
  # logs to this data stream instead based on a document's file path
  nginx:
    - dataset: nginx.access
      if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
      namepsace:
        - {{labels.data_stream.namespace}}
        - default

---

# nginx/data_stream/error.yml
title: Nginx error logs
type: logs

injected_routing_rules:
  # Inject a routing rule into the nginx dataset's pipeline that routes error
  # logs to this data stream instead based on a document's file path
  nginx:
    - dataset: nginx.error
      if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
      namepsace:
        - {{labels.data_stream.namespace}}
        - default

I think the idea is intriguing, but to me, this seems harder to reason about. Having only a single means of defining routing rules is arguably less complex, but rules now have to be spread across multiple files because their defined only at the "leaf" level as @weltenwort described above.

Also if we coalesced on this "inject only" idea, we'd probably just call this field "routing rules" and keep the key/value structure here. This would allow us to "inject" routing rules into the current dataset as well as external ones, e.g.

# nginx/data_stream/nginx/manifest.yml

dataset: nginx

routing_rules:
  # Add routing rules to _this_ dataset
  nginx:
    ...

  # Add routing rules to _other_ datasets
  k8s.router:
    ...
  syslog:
    ...

Curious if others think the same or if this seems reasonable - you folks contribute more directly to integrations than I do as I'm more on the spec/tooling side of things generally.

kpollich commented 1 year ago

I've moved the "User defined custom datasets" part of the description here entirely to https://github.com/elastic/kibana/issues/155911 where custom integrations at large will be tracked by the ingest team.

ruflin commented 1 year ago

I like the idea of having only 1 block for routing rules as you describe above. The part I keep stumbling over is the _this_ definition:

routing_rules:
  # Add routing rules to _this_ dataset
  nginx:
    ...

Is this the most common case or the less common case. We don't know yet. The main thing I'm worried about is errors. The user renames the dataset value further above but forgets to change the name here or has a typo inside. Based on your comment, what if we change this to the following:

routing_rules:
  # Add routing rules to _this_ dataset
  _this_:
    ...

It would require a bit more handling on the Fleet side but would make it explicit that these are the "local" rules. I would assume Fleet needs to handle these rules anyways a bit different as these have lower priority then the injected rules.

kpollich commented 1 year ago

The user renames the dataset value further above but forgets to change the name here or has a typo inside. Based on your comment, what if we change this to the following:

I think this could be an elegant solution, but I don't know about optimizing for this specific change. Is implementing this magic _this_ keyword in Fleet worth it compared to how frequently someone changes the dataset in one of these files?

There's also a possibility that changing this dataset value breaks injected routing rules from other integrations the maintainer doesn't even know about. For example:

# nginx.yml
dataset: nginx-2

---

# some-other-package.yml
routing_rules:
  nginx: # This is now broken
    - dataset: some-other-package
      if: "..."
      namespace:
        - default

I'd rather have no magic keyword than a magic keyword that only fixes some human errors. Changing the dataset for a given data stream should require thought and care.


I've taken a pass at rewriting the example manifest above to use a single routing_rules for clarity:

# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs

# This is a catch-all "sink" data stream that routes documents to 
# other datasets based on conditions or variables
dataset: nginx

# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true

routing_rules:
  # "Local" routing rules are included under this current dataset, not a special case
  nginx:
    # Route error logs to `nginx.error` when they're sourced from an error logfile
    - dataset: nginx.error
      if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

    # Route access logs to `nginx.access` when they're sourced from an access logfile
    - dataset: nginx.access
      if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

  # Route K8's container logs to this catch-all dataset for further routing
  k8s.router: 
    - dataset: nginx
      if: "ctx?.container?.image?.name == 'nginx'"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

  # Route syslog entries tagged with nginx to this catch-all dataset
  syslog:
    - dataset: nginx
      if: "ctx?.tags?.contains('nginx')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default
ruflin commented 1 year ago

I like what you have above. With this we can also improve it in iteration and add these kind of features if needed. Same for the permissions. If you do routing, very likely you need more permissions. But for now we can leave this to the dev.

felixbarny commented 1 year ago

LGTM. I like the simplicity. But we shouldn't forget about the rule priority:

I would assume Fleet needs to handle these rules anyways a bit different as these have lower priority then the injected rules.

We could either introduce a parameter that determines the order or implicitly assign the lowest priority to local rules.

felixbarny commented 1 year ago

There's also a possibility that changing this dataset value breaks injected routing rules from other integrations the maintainer doesn't even know about. For example:

I think we can probably avoid these errors by validating that the dataset exists in any integration. Maybe we can't if we want to support external integrations that aren't inside the elastic/integrations repo.

ruflin commented 1 year ago

I think we can probably avoid these errors by validating that the dataset exists in any integration. Maybe we can't if we want to support external integrations that aren't inside the elastic/integrations repo.

Are you referring to "exists" in an installed integration or in general? For example: The nginx package contains a routing rule for k8s but k8s integration is not installed, k8s routing dataset does not exist. In this scenario, user should still be able to install the nginx integration but the k8s routing rule should not be added. Only when the k8s integration is installed, it should be added. This goes back to https://github.com/elastic/kibana/issues/151898#issuecomment-1532065020

felixbarny commented 1 year ago

I meant we could do that validation at development time (not at package installation time) with a tool like elastic-package.

ruflin commented 1 year ago

I meant we could do that validation at development time (not at package installation time) with a tool like elastic-package.

I think we talk about 2 different things here: I want to validate, that the dataset is setup as part of an integration. I think you talk about if it exists in general in any integration. This is not possible as elastic-package can't be aware of all integrations.

ruflin commented 1 year ago

We had some more discussions on the topic around routing rules. I'm proposing for now to not focus on the injected rules but only the "local" rules. To be specific, a dataset always loads its own rules and users can eventually manually add rules to it (not through packages). For now, nginx can't add a rule to k8s. The goal is that this simplifies the implementation, we can always add the feature later. On the yaml side, I would keep the proposed structure to make sure the door stays open but it means we should do some validation that the names are equal.

There is a second topic that go raised: Lets assume a user has multiple K8s clusters. Each cluster is shipping data to their own namespace. Different routing rules might be specified per cluster / namespace. Same can apply for the syslog dataset. There might be multiple syslog endpoints receiving very different data. Being able to define rules per namespace could be very useful in this context. The assumption is that these would be all rules specified by the user, not loaded by a package. Long term, it would be nice if the user could also export these manual rules in a package.

joshdover commented 1 year ago

Thanks for the summary of the discussion. I 100% agree with the focus on local rules first, followed by user-defined rules, then we'll figure out how we want the cross-package rules to work. The first two are much simpler cases and unlock a lot of functionality.

For the namespace-specific rules, it seems we'll finally need to go implement https://github.com/elastic/kibana/issues/121118

kpollich commented 1 year ago

Thanks for the summary 🙏 - I think we've landed on the right path forward here.

On the yaml side, I would keep the proposed structure to make sure the door stays open but it means we should do some validation that the names are equal.

+1 on maintaining the structure defined above for future-proofing here. I'll make this clear in the implementation issues.

kpollich commented 1 year ago

I've updated the following issues based on our conversation above

felixbarny commented 1 year ago

Even though injected rules are not supported, yet, I think we can close this issue as completed. I think we'll want to wait with the implementation for injected rules until we get more feedback and collect more real-world use cases, or have a concrete plan for how we exactly want to use injected rules in our integrations.

Thanks everyone for your work on this 🙏

jlind23 commented 1 year ago

Thanks @felixbarny I will for now postponed the work on injected rules then.