Closed vasu-git closed 4 years ago
Does put_log_events_retry_limit 0
and using fluent-plugin-cloudwatch-logs v0.10.1 solve this issue?
https://github.com/fluent-plugins-nursery/fluent-plugin-cloudwatch-logs/pull/199 should fix this issue. Closing.
Problem
I am sending logs from kubernetes cluster to cloudwatch. When I check cloudwatch logs, I see lot of duplicates, and the number of duplicates are different for different log lines. Fluentd is deployed as a daemonset. I see a lot of errors, warnings like below in fluentd logs
[error]: Exception emitting record: queue size exceeds limit
2020-05-05 06:52:25 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2020-05-05 06:52:23 +0000 error_class="Aws::CloudWatchLogs::Errors::ThrottlingException" error="Rate exceeded" plugin_id="object:3fccfe4342b0"
2020-05-05 06:52:27 +0000 [warn]: retry succeeded. plugin_id="object:3fccfe4342b0"
It looks like AWS is throttling some of these requests and fluentd tries to send them again, but looks like aws is ending up saving the log events???
I tried setting
in match config, but to no-avail. Still seeing multiple duplicates (>2) for many log lines
Please correct me if I'm wrong here but... If I want to set no retries at all, it looks like the follow code allows atleast one retry (line 403)
if !@put_log_events_disable_retry_limit && @put_log_events_retry_limit < retry_count
Probably should have another case to check instead of 0<0??Steps to replicate
Config:
Expected Behavior or What you need to ask
No duplicates in cloudwatch logs
Using Fluentd and CloudWatchLogs plugin versions
Kubernetes 1.16 in AWS
activesupport (5.2.3) addressable (2.6.0) aws-eventstream (1.0.3) aws-partitions (1.196.0) aws-sdk-cloudwatchlogs (1.25.0) aws-sdk-core (3.62.0, 2.10.50) aws-sigv4 (1.1.0) bigdecimal (1.2.8) concurrent-ruby (1.1.5) cool.io (1.4.6) did_you_mean (1.0.0) domain_name (0.5.20190701) ffi (1.11.1) fluent-mixin-config-placeholders (0.4.0) fluent-plugin-cloudwatch-logs (0.4.5) fluent-plugin-detect-exceptions (0.0.9) fluent-plugin-kubernetes_metadata_filter (1.0.1) fluent-plugin-prometheus (0.4.0) fluent-plugin-record-reformer (0.9.1) fluent-plugin-secure-forward (0.4.5) fluent-plugin-systemd (0.0.8) fluentd (0.12.33) http (0.9.8) http-cookie (1.0.3) http-form_data (1.0.3) http_parser.rb (0.6.0) i18n (1.6.0) io-console (0.4.5) jmespath (1.4.0) json (2.0.3, 1.8.3) kubeclient (1.1.4) lru_redux (1.1.0) mime-types (3.2.2) mime-types-data (3.2019.0331) minitest (5.9.0) msgpack (1.1.0) net-telnet (0.1.1) netrc (0.11.0) oj (2.18.5) power_assert (0.2.7) prometheus-client (0.9.0) proxifier (1.0.3) psych (2.1.0) public_suffix (3.1.1) quantile (0.2.1) rake (10.5.0) rdoc (4.2.1) recursive-open-struct (1.0.0) resolve-hostname (0.1.0) rest-client (2.0.2) sigdump (0.2.4) string-scrub (0.0.5) systemd-journal (1.4.1) test-unit (3.1.7) thread_safe (0.3.6) tzinfo (1.2.2) tzinfo-data (1.2017.2) unf (0.1.4) unf_ext (0.0.7.6) uuidtools (2.1.5) yajl-ruby (1.3.0)