elastic / elastic-serverless-forwarder

Elastic Serverless Forwarder
Other
35 stars 36 forks source link

Enable shared outputs and multiple outputs per input #721

Closed constanca-m closed 3 months ago

constanca-m commented 4 months ago

Enhancement

Currently, each input needs to have defined one output (see an example of the configuration file here).

However, this is not convenient. If a user wants to use the same output for 4 inputs, they have to define it 4 times. And if they want to change it later, they have to do it another 4 times, when we could place them as a shared output.

There are also two issues (1 and 2) requesting the possibility to have more than one output defined for input. At this moment, this is not possible, unless the output type is different. Current options are:

https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/share/config.py#L15

The outputs for each input are saved in a dictionary:

https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/share/config.py#L294

However, this dictionary is using the type as the key, so if we try to add a new output with the same type, it will fail:

https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/share/config.py#L421-L430

Additionally, and since in the code,each input is uniquely identified by its id (see this function), I also do not think it is possible to configure the same input to have multiple outputs at all, even if specified in double.

To further clarify this change, let's consider this example: I want to:

  1. Have two inputs, each for corresponding to one cloudwatch logs group. Both should send data to the same ES.
  2. Have two outputs for one of the two cloudwatch logs group.

For this, the best we can do is:

inputs:
  - id: "<ID-1>"
    type: cloudwatch-logs
    outputs:
      - args:
          api_key: "<API-KEY-1>"
          elasticsearch_url": "<URL-1>"
          es_datastream_name: "logs-esf.cloudwatch-default"
        type: elasticsearch
  - id: "<ID-2>"
    type: cloudwatch-logs
    outputs:
      - args:
          api_key: "<API-KEY-1>"
          elasticsearch_url": "<URL-1>"
          es_datastream_name: "logs-esf.cloudwatch-default"
        type: elasticsearch

So I had to define the same output twice, and I fail at achieving my goal 2., since it is not possible.

With this enhancement I could refactor the config.yaml file to look like this:

# Shared outputs
outputs:
  - args:
      api_key: "<API-KEY-1>"
      elasticsearch_url": "<URL-1>"
      es_datastream_name: "logs-esf.cloudwatch-default"
    type: elasticsearch
    id: my-shared-output-1

inputs:
  - id: "<ID-1>"
    type: cloudwatch-logs
    outputs:
      - args:
          api_key: "<API-KEY-2>"
          elasticsearch_url": "<URL-2>"
          es_datastream_name: "logs-esf.cloudwatch-default"
        type: elasticsearch
      - id: my-shared-output-1
  - id: "<ID-2>"
    type: cloudwatch-logs
    outputs:
      - id: my-shared-output-1

Explanation:

  1. I defined a list under outputs that will be the shared output. The shared outputs should have a field id to simplify their reference in case the user wants to have multiple shared outputs, but not apply them all to inputs.
  2. I can add more outputs under inputs[*].outputs list.

To completion

The code needs to be changed.

The configuration file is parsed here: https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/handlers/aws/handler.py#L72

Inside this function we iterate over the outputs: https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/share/config.py#L572

We need to change this function to allow more than one output.

These outputs are then reference here:

https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/handlers/aws/handler.py#L173

Note: event_input contains our outputs.

This composite_shipper gets our outputs, and later uses it when calling (this is just one reference of usage):

https://github.com/elastic/elastic-serverless-forwarder/blob/acbe70242afad1d5061d64fd4d12b7e647de3768/handlers/aws/handler.py#L195

After this change, the documentation here needs, of course, to be updated.

PRs