awslabs / aws-fluent-plugin-kinesis

Amazon Kinesis output plugin for Fluentd
Apache License 2.0
293 stars 96 forks source link

When using shared_credentials, plugin does not seem to honor session token expiry #217

Closed wltbenade closed 1 year ago

wltbenade commented 2 years ago

Hi guys,

We've encountered an issue when configuring the plugin with rotating credentials via shared_credentials. Regardless of the duration of the session token, once it has rotated in the credentials file, the plugin does not seem to read the new configuration.

It should be noted that delivery works as expected whilst the session token is valid. Plugin version:

fluent-plugin-kinesis (3.4.2)

Plugin config:

<match **>
  @type kinesis_streams
  stream_name "#{ENV['STREAM_NAME']}"
  <shared_credentials>
     path             /root/.aws/credentials
     profile_name     default
   </shared_credentials>
   <buffer>
    @type file
    path /var/log/fluentd-buffers/kubernetes.system.buffer
    flush_mode interval
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    overflow_action block
    </buffer>
</match>

$HOME/.aws/credentials:

[default]
aws_access_key_id        = REDACTED
aws_secret_access_key    = REDACTED
aws_session_token        = REDACTED
aws_security_token       = REDACTED
x_principal_arn          = REDACTED 
x_security_token_expires = 2021-12-24T13:11:46Z
region                   = eu-west-1

Error from Fleuntd:

[warn]: #0 failed to flush the buffer. retry_times=12 next_retry_time=2021-12-24 12:25:14 +0000 chunk="5d3e3629e9bfb3ae2941e5c2a3a622f6" error_class=Aws::Kinesis::Errors::ExpiredTokenException error="The security token included in the request is expired"

Would it be possible to implement duration_seconds on shared_credentials in a similar fashion as it works on assume_role_credentials?

Cheers!

simukappu commented 2 years ago

Hi @wltbenade, thank you for your feedback.

This plugin uses AWS SDK for Ruby v3 as it is. AWS SDK for Ruby credential providers (including assume role credentials) already provide refreshable credentials, which are intended for auto-refresh when expired. Please see this related issue. However, this auto-refresh doesn't seem to work for Aws::Kinesis::Errors::ExpiredTokenException since retry errors does not include 'ExpiredTokenException' here. If you would like to add auto-refresh working with Kinesis, I think this should be handled in AWS SDK side. What do you think about it? Can you provide an enhancement request to AWS SDK for Ruby?

wltbenade commented 2 years ago

Hi @simukappu,

Happy New Year to you and the team!

Thanks for the suggestions! Just to be clear on your recommendation:

  1. The ErrorInspector class in error_inspector.rb should be expanded to include ExpiredTokenException, as well as ExpiredToken.
  2. A proper result from the ErrorInspector class in error_inspector.rb will result in the plugin reloading the credentials from disk again?

Another thought just occurred to me. The credentials on disk are rotated prior to their expiry. Would it not be more pertinent to reload them upon detecting a refresh?

wltbenade commented 2 years ago

Hi @simukappu,

Any update on my question above?

Cheers

simukappu commented 2 years ago

Hi @wltbenade

Sorry for my late reply. Thank you for your enhancement request to AWS SDK for Ruby. Could you check if this issue would be resolved with this plugin? You have to use the latest AWS SDK for Ruby, not included version in td-agent.

wltbenade commented 2 years ago

Hi @simukappu,

Apologies for taking so long to get back to you.

I've re-compiled the plugin using both aws-sdk-core 3.125.5 (patch was included in 3.125.3) and 3.126.0, but I'm still experiencing the same issue:

2022-02-07 18:13:00 +0000 [info]: #0 fluentd worker is now running worker=0
2022-02-07 19:13:06 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-02-07 19:13:07 +0000 chunk="5d772638e32fc9e2b67656d49f78b1bc" error_class=Aws::Kinesis::Errors::ExpiredTokenException error="The security token included in the request is expired"
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:22:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-core-3.126.0/lib/seahorse/client/request.rb:72:in `send_request'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/aws-sdk-kinesis-1.39.0/lib/aws-sdk-kinesis/client.rb:1922:in `put_records'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/out_kinesis_streams.rb:49:in `block in write'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/kinesis_helper/api.rb:94:in `batch_request_with_retry'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/kinesis.rb:157:in `block in write_records_batch'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/kinesis_helper/api.rb:89:in `split_to_batches'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/kinesis.rb:155:in `write_records_batch'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-kinesis-3.4.2/lib/fluent/plugin/out_kinesis_streams.rb:45:in `write'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.4/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.4/lib/fluent/plugin/output.rb:1491:in `flush_thread_run'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.4/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-02-07 19:13:06 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

As you can see, after an hour (default expiry used on the credentials), the requests begin to fail.

Any idea where I can start to look?

divbasson2 commented 2 years ago

Hi, any feedback on the comment posted by @wltbenade above?

simukappu commented 1 year ago

Is anyone who has the same issue with latest AWS SDK for Ruby? If not, we'll close the issue for now.