fluent-plugins-nursery / fluent-plugin-cloudwatch-logs

CloudWatch Logs Plugin for Fluentd
MIT License
201 stars 141 forks source link

Getting Fluent::Plugin::CloudwatchLogsOutput::TooLargeEventError #253

Open sergeisantoyo opened 1 year ago

sergeisantoyo commented 1 year ago

Problem

I'm using image fluent/fluentd-kubernetes-daemonset:v1.12.2-debian-cloudwatch-1.3 and I also tried with the latest one, but I'm getting this error while trying to log to Cloudwatch.

#<Thread:0x00007f7b44f9b2c8 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:323 run> terminated with exception (report_on_exception is true):
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:382:in `put_events_by_chunk': Log event in <LOG_GROUP_NAME> is discarded because it is too large: 671770 bytes exceeds limit of 262144 (Fluent::Plugin::CloudwatchLogsOutput::TooLargeEventError)
    from /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:326:in `block (2 levels) in write'

I know this is an issue with the Cloudwatch API that limits the size of the events. Is this fixed for other versions of that image? I've seen that for fluent-bit, the solution was to truncate the log if it's too big.

Any idea on how to fix it for this specific image?

...

Steps to replicate

This is my current config:

02_output.conf: |-
  <label @NORMAL>
    <match **>
      @type cloudwatch_logs
      @id out_cloudwatch_logs_containers
      region "#{ENV.fetch('REGION')}"
      log_group_name_key group_name
      remove_log_group_name_key true
      log_group_aws_tags "{ \"Name\": \"#{ENV.fetch('CLUSTER_NAME')}\", \"kubernetes.io/cluster/#{ENV.fetch('CLUSTER_NAME')}\": \"owned\" }"
      log_stream_name_key stream_name
      remove_log_stream_name_key true
      auto_create_stream true
      retention_in_days 365
      concurrency 16
      <buffer>
        @type memory
        flush_thread_count 16
        flush_mode interval
        flush_at_shutdown true
        # total_limit_size 1GB
        # flush_interval 1s
        # chunk_limit_size 512k
        queued_chunks_limit_size 32
        retry_forever false
        retry_timeout 10m
        retry_max_times 5
        disable_chunk_backup true
      </buffer>
      <format>
        @type single_value
        message_key log
        add_newline false
      </format>
    </match>
  </label>

Reproducing is kinda difficult, but basically make a container log a very large log entry.

Expected Behavior or What you need to ask

I would expect the plugin to truncate the event or split it in different events.

Using Fluentd and CloudWatchLogs plugin versions

fernandino143 commented 5 months ago

I'm seeing the same issue using the latest "0.14.3" version. Is there even a workaround for this?