Our three K8s clusters are calling DescribeLogStreams so frequently (~15000 calls per hour), that the AWS console is showing 'rate exceeded' errors when we try to do the same.
Of those calls, 66% are from fluentd and this plugin, accessing just 51 unique log group/stream combinations.
Problem
Our three K8s clusters are calling DescribeLogStreams so frequently (~15000 calls per hour), that the AWS console is showing 'rate exceeded' errors when we try to do the same.
Of those calls, 66% are from fluentd and this plugin, accessing just 51 unique log group/stream combinations.
Steps to replicate
We're using the fluent/fluentd-kubernetes-daemonset:v1.10.4-debian-cloudwatch-1.0 docker container, found here: https://github.com/fluent/fluentd-kubernetes-daemonset
Expected Behavior or What you need to ask
I'm expecting that the code wouldn't call describe_log_streams nearly so often.
The problem seems to be here: https://github.com/fluent-plugins-nursery/fluent-plugin-cloudwatch-logs/blob/b500459107d9fd1507def77614178383b5cc0d58/lib/fluent/plugin/out_cloudwatch_logs.rb#L378-L388
Which calls here: https://github.com/fluent-plugins-nursery/fluent-plugin-cloudwatch-logs/blob/b500459107d9fd1507def77614178383b5cc0d58/lib/fluent/plugin/out_cloudwatch_logs.rb#L499
AWS Support informed us that put_log_events returns the expectedSequenceToken value in the error message, so describe_log_streams doesn't have to be called: https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/CloudWatchLogs/Client.html#put_log_events-instance_method
Using Fluentd and CloudWatchLogs plugin versions
fluent-gem list
,td-agent-gem list
or your Gemfile.lock https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/docker-image/v1.10/debian-cloudwatch/Gemfile.lock#L28