fluent-plugins-nursery / fluent-plugin-cloudwatch-logs

CloudWatch Logs Plugin for Fluentd
MIT License
201 stars 141 forks source link

When invoked from the fluent-plugin-cloudwatch-logs, the AWS SDK logs every AWS API call it makes, no matter what global logging level is set for fluentd #237

Open syedriko opened 3 years ago

syedriko commented 3 years ago

Problem

When invoked from the fluent-plugin-cloudwatch-logs plugin, the AWS SDK logs every AWS API call it makes, no matter what global logging level is set for fluentd. The end result is that fluentd log grows proportionately to the "payload" log traffic that is forwarded to CloudWatch.

I believe the root cause is here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/aws-sdk-core/plugins/logging.rb#L53, because this line

config.logger.send(config.log_level, format(config, response))

always logs at the current log level.

However, an issue in this repo provides the necessary application-level context for what is arguably a defect in the AWS SDK.

Steps to replicate

Example fluentd config

fluentd.conf:

<source>
  @type forward
  @id forward_input
</source>

<match test.**>
  @type stdout
  @id stdout_output
</match>

<match cw.**>
  @type cloudwatch_logs
  auto_create_stream true
  region <...>
  log_group_name cw_group_name
  log_stream_name cw_stream_name
  concurrency 2
  aws_key_id <...>
  aws_sec_key <...>
  include_time_key true
  log_rejected_request true
</match>

Fluentd logs

In response to

$ echo '{"foo":"bar"}' | fluent-cat cw

the following is seen in fluentd logs. Note that the log messages from Aws::CloudWatchLogs::Client are issued at every log level:

$ fluentd -c ./fluentd.conf --no-supervisor // info log level

2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.2607 0 retries] describe_log_groups()  
2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.068632 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.054434 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 10:16:56 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.072863 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 10:16:56 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.090928 0 retries] put_log_events(log_events:[{timestamp:1628086554740,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:15:54-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

===============

$ fluentd -c ./fluentd.conf --no-supervisor -v // debug log level

2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.243379 0 retries] describe_log_groups()  
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.055964 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.048201 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.055562 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.083676 0 retries] put_log_events(log_events:[{timestamp:1628086859696,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:20:59-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

==============

$ fluentd -c ./fluentd.conf --no-supervisor -vv // trace log level

021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.285799 0 retries] describe_log_groups()  
2021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.06596 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.056249 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 10:30:39 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.08038 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 10:30:39 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.097014 0 retries] put_log_events(log_events:[{timestamp:1628087373685,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:29:33-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

==============

$ fluentd -c ./fluentd.conf --no-supervisor -q // warn log level

2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.244758 0 retries] describe_log_groups()  
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.058111 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.047988 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.060144 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.073899 0 retries] put_log_events(log_events:[{timestamp:1628087823673,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:37:03-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

==============

$ fluentd -c ./fluentd.conf --no-supervisor -qq // error log level

2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.241862 0 retries] describe_log_groups()  
2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.064994 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.047917 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 10:43:17 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.05636 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 10:43:17 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.081287 0 retries] put_log_events(log_events:[{timestamp:1628088133366,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:42:13-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

Expected Behavior

I would expect that AWS API calls are logged at a certain log level (debug) and above, but not below. For example, with the debug log level hardcoded in the AWS SDK code line mentioned above:

$ fluentd -c ./fluentd.conf --no-supervisor -vv // trace

2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.320894 0 retries] describe_log_groups()  
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.063318 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.047943 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.060117 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.088652 0 retries] put_log_events(log_events:[{timestamp:1628092555600,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T11:55:55-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  

========

$ fluentd -c ./fluentd.conf --no-supervisor -v // debug
# hardcoded :debug

2021-08-04 11:41:22 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.237807 0 retries] describe_log_groups()  
2021-08-04 11:41:22 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.05453 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)  
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.04345 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)  
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.059298 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.06734 0 retries] put_log_events(log_events:[{timestamp:1628091621300,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T11:40:21-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")  
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: Called PutLogEvents API group="cw_group_name" stream="cw_stream_name" events_count=1 events_bytesize=74 sequence_token=nil thread=11609020 request_sec=0.067749457

========

$ fluentd -c ./fluentd.conf --no-supervisor // info
no output

==========

$ fluentd -c ./fluentd.conf --no-supervisor -q // warn
no output

==========

Fluentd and CloudWatchLogs plugin versions

LOCAL GEMS

aws-eventstream (1.1.0) aws-partitions (1.396.0) aws-sdk-cloudwatchlogs (1.38.0) aws-sdk-core (3.109.3) aws-sigv4 (1.2.2) bigdecimal (default: 1.3.4) cmath (default: 1.0.0) concurrent-ruby (1.1.9) cool.io (1.7.1) csv (default: 1.0.0) date (default: 1.0.0) did_you_mean (1.2.0) dig_rb (1.0.1) etc (default: 1.0.0) fcntl (default: 1.0.0) fiddle (default: 1.0.0) fileutils (default: 1.0.2) fluent-plugin-cloudwatch-logs (0.7.6) fluentd (1.7.4) http_parser.rb (0.6.0) io-console (default: 0.4.6) ipaddr (default: 1.2.0) jmespath (1.4.0) json (default: 2.1.0) minitest (5.10.3) msgpack (1.4.2) net-telnet (0.1.1) openssl (default: 2.1.2) power_assert (1.1.1) psych (default: 3.0.2) rake (12.3.0) rdoc (default: 6.0.1) scanf (default: 1.0.0) sdbm (default: 1.0.0) serverengine (2.2.4) sigdump (0.2.4) stringio (default: 0.0.1) strptime (0.2.5) strscan (default: 1.0.0) test-unit (3.2.7) tzinfo (2.0.4) tzinfo-data (1.2021.1) webrick (default: 1.4.2) xmlrpc (0.3.0) yajl-ruby (1.4.1) zlib (default: 1.0.0) $

syedriko commented 3 years ago

On a second thought, I'm not sure this is an AWS SDK issue necessarily. Consider this test snippet, which is supposed to fail a create_log_group() call at different log levels:

require 'aws-sdk-cloudwatchlogs'
require 'time'
require 'logger'

logger = Logger.new(STDOUT)
logger.level = 'UNKNOWN'

cwl_client = Aws::CloudWatchLogs::Client.new(
  region: 'us-east-2',
  access_key_id: <...>,
  secret_access_key: <...>,
  logger: logger
)

resp = cwl_client.delete_log_group(
  log_group_name: 'foo'
)

resp = cwl_client.create_log_group(
  log_group_name: 'foo'
)
if !resp.successful?
  puts 'create_log_group failed' + resp
  exit(1)
end

['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL', 'UNKNOWN'].each do |level|
  begin
    puts "logger.level: " + level
    logger.level = level
    resp = cwl_client.create_log_group(
      log_group_name: 'foo'
    )
  rescue    
  end
end

The output this produces is

$ ruby ./test.rb 
logger.level: DEBUG
D, [2021-08-11T23:21:09.354722 #28770] DEBUG -- : [Aws::CloudWatchLogs::Client 400 0.054844 0 retries] create_log_group(log_group_name:"foo") Aws::CloudWatchLogs::Errors::ResourceAlreadyExistsException The specified log group already exists

logger.level: INFO
logger.level: WARN
logger.level: ERROR
logger.level: FATAL
logger.level: UNKNOWN
$ 

that is, AWS API calls are only logged at DEBUG level.