Feature Request: stream_key

zackwine commented 4 years ago

In the Kubernetes daemonset use case we would like to be able to route records from differing components to different kinesis streams. If the kinesis output plugin could use a key within each record to determine the destination kinesis stream this would be useful.

For example:

[OUTPUT]
    Name            kinesis
    Match           *
    region           us-west-2
    stream          my-kinesis-stream-name
    stream_key   component-stream

With the config above the kinesis output plugin would route each record to the value of the component-stream key within the record. If the record does not have a component-stream key, then the record would be routed to the default stream my-kinesis-stream-name.

hossain-rayhan commented 4 years ago

@zackwine It would be good if you can explain your use case in details. However, I am adding some info if it helps.

In Fluent Bit configuration file, we can write different Match rules for routing logs to different destinations. Please look into them if you have not yet.

Nit: Also, kinesis data stream has a concept of using different shards for grouping the logs based upon their category. If needed, customer can write different consumer applications to get data from different shards. In the fluent_bit config file, we can set the partition_key value to route logs to different shards. Let us know if you need more info regarding this.

zackwine commented 4 years ago

Thanks for the quick response.

Currently we leverage fluentd in the following manner. Each pod/container in a Kubernetes cluster is annotated with a "LogLane" which can be thought of as a kinesis stream in this case. Fluentd currently routes the logs using this annotation in conjunction with the rewrite_tag_filter feature. This allows us to separate logs from our 60-70 different components into different "LogLanes" without our Fluentd Daemonset being aware of which component goes to which stream (its driven by a Kubernetes annotation, instead of pod/container name).

We are now wanting to move to fluent-bit for the performance benefits, but it lacks the tag rewrite feature (at least until the next release). There is the stream processor feature as workaround, but this was another option that looked promising. If the kinesis output could dynamically pull the stream name from the records, then we wouldn't have to update our fluent-bit config when we add new streams. Further we wouldn't have a config file with 15+ kinesis outputs for each of kinesis streams we currently use.

Currently we do not have a custom consumer in an attempt to limit the code/config we own. We leverage firehose to write to different destinations. For example one LogLane may only go to S3 for reporting, and another may go to a specific ElasticSearch index. So writing a custom consumer would mean moving away from firehose and writing/managing our own consumer.

Overview of how we are using kinesis:

Fluentd-Daemonset -> Kinesis -> Firehose -> ES index1
   |-> Kinesis -> Firehose -> ES index2
   |-> Kinesis -> Firehose -> S3 bucket1
   |-> Kinesis -> Firehose -> S3 bucket2
   ...

Edit: I left out that those streams may have custom lambda's for decompression/transformation.

hossain-rayhan commented 4 years ago

@zackwine Thanks for your detailed response.

I believe the maintainer of fluent-bit is working to support the tag rewrite feature. Currently, we are inclined to wait for the fluent-bit release instead of updating our plugins. There are couple of reasons behind this. If any of the stream_key values is wrong (invalid stream name), the plugin will fail to send logs and it will hamper the performance. This is our main concern. Also, right now for our plugin, we maintain a buffer of 500 records. If we consider adding this feature, we need to create different buffers for each stream. It requires decent-sized change to our plugin design which makes it a big feature request. Finally, if we do so, same changes need to eventually be applied for firehose which follows the same principle. Rather than making change for every plugin, its good to have the option in the core of fluent-bit.

zackwine commented 4 years ago

@hossain-rayhan Thanks for your response.

After partially adding this feature to the plugin yesterday I understand your concerns. This feature would complicate buffering, error handling (particularly when the stream doesn't exist), and progressive backoff logic.

I'll leave it up to you if you want to close this issue, or leave it open. I will leverage fluent-bit core features to accomplish routing.

PettitWesley commented 4 years ago

@zackwine thanks for understanding!

a config file with 15+ kinesis outputs

That's a lot; I understand your desire for this feature. In the CloudWatch plugin, we do plan to incorporate a similar feature- setting the log group dynamically based on a key. The key difference is that the CloudWatch plugin can create log groups since they are very simple resources. Creating kinesis streams is not possible in this plugin, which leads to the problems Rayhan noted.

Personally, my long term hope is that Fluent Bit becomes such a standard observability tool that a custom integration for it is built into Kubernetes. I think that the ideal way to run and manage Fluent Bit is as a Daemonset. But the ideal way to configure it is as a side-car; custom configuration for each pod that is independent. We've talked to the creator of Fluent Bit about adding support for dynamic configuration re-uploading. Imagine that every time a new pod gets scheduled on a node, new config is automatically added to configure Fluent Bit for that pod. This is where we plan to take FireLens for ECS in the future.

For now, we will keep this issue open and to gauge interest in this feature. If we get a lot of feedback that this is something people want, we will re-consider it. Please plus 1 or comment if you think we are wrong and that this feature is a good idea.

PettitWesley commented 4 years ago

Rewrite tag has been added: https://github.com/fluent/fluent-bit/commit/013a255384ce107023e1150d53dd605e9375b7c5

hossain-rayhan commented 4 years ago

Closing this issue. @zackwine feel free to re-open if tag-rewrite doesn't help. Also, thanks @PettitWesley for the update.

priyavartk commented 2 years ago

Thanks for the quick response.

Currently we leverage fluentd in the following manner. Each pod/container in a Kubernetes cluster is annotated with a "LogLane" which can be thought of as a kinesis stream in this case. Fluentd currently routes the logs using this annotation in conjunction with the rewrite_tag_filter feature. This allows us to separate logs from our 60-70 different components into different "LogLanes" without our Fluentd Daemonset being aware of which component goes to which stream (its driven by a Kubernetes annotation, instead of pod/container name).

We are now wanting to move to fluent-bit for the performance benefits, but it lacks the tag rewrite feature (at least until the next release). There is the stream processor feature as workaround, but this was another option that looked promising. If the kinesis output could dynamically pull the stream name from the records, then we wouldn't have to update our fluent-bit config when we add new streams. Further we wouldn't have a config file with 15+ kinesis outputs for each of kinesis streams we currently use.

Currently we do not have a custom consumer in an attempt to limit the code/config we own. We leverage firehose to write to different destinations. For example one LogLane may only go to S3 for reporting, and another may go to a specific ElasticSearch index. So writing a custom consumer would mean moving away from firehose and writing/managing our own consumer.

Overview of how we are using kinesis:
Fluentd-Daemonset -> Kinesis -> Firehose -> ES index1
   |-> Kinesis -> Firehose -> ES index2
   |-> Kinesis -> Firehose -> S3 bucket1
   |-> Kinesis -> Firehose -> S3 bucket2
   ...
Edit: I left out that those streams may have custom lambda's for decompression/transformation.

Hi , may you please let me know how do you write to ES Index1 and ESIndex 2 from same Firehose. I am trying to set up a logging from fluentbit-kinesis-ES such that each namespace have seperate index or if not possible at least then fluentbit-kinesisstream1(2,3) so on where each stream maps to unique namespace

PettitWesley commented 2 years ago

@priyavartk I think you need one Firehose stream per namespace and per index. And then you want to use this tutorial as a starting point I think: https://github.com/aws/aws-for-fluent-bit/tree/mainline/use_cases/k8s-metadata-customize-tag

Please open a new issue on the AWS for Fluent Bit repo if you want more guidance.

aws / amazon-kinesis-streams-for-fluent-bit

Feature Request: stream_key #15