Closed chungath closed 5 years ago
Hi, Thank you for your inquiry.
A1: Fluentd buffer configuration including file buffer is not configuration of this plugin, but this is configuration of Fluentd. Please see this Config: Buffer Section in Fluentd document.
A2: I think you should try increasing shard count. 4 in not a large number as shard count for a large number of log events.
Thanks for the response. We will increase the shard count.
In this case should we be using kinesis_streams_aggregated instead of kinesis_streams to handle high load?
It depends on the use case. KPL format as kinesis_streams_aggregated may be cost effective, but you have to de-aggregate in a consumer. Please see the following documents: https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-aggregation.html https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html
Closing this issue for now. Please reopen if required.
Hello,
We have a fluentd aggregator running in Amazon ECS inside a docker container. We also have micro services deployed to Amazon ECS which forward the logs to this fluentd aggregator using the fluentd agent installed in the the EC2 machines(This is basically a configuration in ECS tasks to send logs to the aggregator). The setup is similar to the one explained in here.
The Fluentd docker image used is fluent/fluentd:v1.5-1.
The Fluentd aggregator uses the fluent-plugin-kinesis plugin to send the logs to a Kinesis stream. The number of shards configured for the kinesis stream are 4(4 shards were enough when we used to have a cloudwatch integration instead of fluentd, hence we are trying currently with the same number with the setup using fluentd). The following are the configuration parameters for the kinesis plugin:
We also have configured an S3 output plugin to backup the data in S3.
Now, the problem is that when there are large number of incoming log events per second coming to the fuentd aggregator, we get the a high number of the following error in the kinesis output plugin.
ProvisionedThroughputExceededException/Rate exceeded for shard shardId-00000000000x in stream xxxx
From kinesis documentation: A single kinesis shard can ingest up to 1 MiB of data per second or 1,000 records per second for writes. When this limit is crossed it throws the above ProvisionedThroughputExceededException.
We tried to add more retries(retries_on_batch_request) in the kinesis output plugin to resolve this. By default it was doing retry 8 times. We increased it to 20. Now that gave the following error when there was a surge in log data
error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data"
It seems that because there was so much data being retried, the fluentd worker processes were holding too much of data in memory(queued buffer chunks) causing this error - buffer space has too many data.
Then we also tried to use a file buffer instead of the memory buffer
This one gave the following errors:
Here the errors appeared not just for the kinesis file buffering, it also gave errors for s3 buffers(There were no such errors in s3 before using file buffer in kinesis). Given below is the s3 buffer configuration
Question1: Seems like we are doing something wrong with the file buffer configurations here. Let us know how to sort this out.
Question2: What is the right configuration for making the kinesis output plugin support such a large number of log events or is it that we have to increase the number of shards in the kinesis stream to support this?
Please let us know.