Error encountered with `refresh_credentials_interval` configuration option in fluent-plugin-opensearch v1.1.1

Expected Behavior or What you need to ask

When using the refresh_credentials_interval configuration option, the specified value does not take effect in the underlying AWS SDK. As a result, an error is consistently encountered: The requested DurationSeconds exceeds the MaxSessionDuration set for this role (Aws::STS::Errors::ValidationError)

Additional Information

Reverting back to the previous version of fluent-plugin-opensearch resolves the issue. The role used has a maximum session duration of 1 hour. Various refresh_credentials_interval values, including the minimum allowed (e.g., "15m", "30m"), were tested without success.

Complete error logs message:

/usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call': The requested DurationSeconds exceeds the MaxSessionDuration set for this role. (Aws::STS::Errors::ValidationError)
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/request.rb:72:in `send_request'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-sts/client.rb:1575:in `assume_role_with_web_identity'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:76:in `refresh'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/refreshing_credentials.rb:30:in `initialize'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:64:in `initialize'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `new'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `aws_credentials'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:351:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin.rb:187:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:132:in `add_match'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:74:in `block in configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:64:in `each'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:64:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/root_agent.rb:149:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/engine.rb:105:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/engine.rb:80:in `run_configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/supervisor.rb:571:in `run_supervisor'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
        from <internal:/usr/local/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/local/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/bin/fluentd:15:in `<top (required)>'
        from /usr/local/bundle/bin/fluentd:25:in `load'
        from /usr/local/bundle/bin/fluentd:25:in `<main>'

Using Fluentd and OpenSearch plugin versions

Fluentd v1.16.1
OpenSearch plugin version: v1.1.1

Same here.

My Fluentd pod uses custom IAM roles for Service Accounts. This role's maxSessionDuration is currently set to 1h:

apiVersion: ...
kind: IRSA
metadata:
  name: fluentd-os-test
  namespace: fluent-system
  annotations:
    XXXX: managed
spec:
  serviceAccount: fluentd
  path: ${IRSA_ROLE_PATH:=/XXX/}
  # increasing this to sync with the fluent-plugin-opensearch latest update: https://github.com/fluent/fluent-plugin-opensearch/pull/78/files
  # it set the default fluentd session duration to 5 hours
  # our default maxSessionDuration was 1 hour, now it is 5 hours
  maxSessionDuration: 3600 # 1 hour
  inlinePolicy: |
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "...",
          "Action": "...",
          "Resource": "..."
        }
      ]
    }

And the Fluentd ClusterOutput is set to refresh_credentials_interval 48m:

apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
  name: opensearch
  labels:
    output.fluentd.fluent.io/enabled: "true"
    output.fluentd.fluent.io/tenant: "core"
spec:
  outputs:
    - customPlugin:
        config: |
          <match **>
            @type copy
            <store>
              @type opensearch
              host "${FLUENT_OPENSEARCH_HOST}"
              port 443
              logstash_format  true
              logstash_prefix logs-core
              scheme https
              log_os_400_reason true
              @log_level ${FLUENTD_OUTPUT_LOGLEVEL:=error}
              <buffer>
                @type ${FLUENTD_BUFFER_TYPE:=memory}
                path ${FLUENTD_BUFFER_PATH:=/buffers/opensearch/raas-core}
                flush_mode ${FLUENTD_BUFFER_FLUSH_MODE:=interval}
                flush_interval ${FLUENTD_BUFFER_FLUSH_INTERVAL:=60s}
                flush_thread_count ${FLUENTD_BUFFER_FLUSH_THREAD_COUNT:=2}
                flush_at_shutdown ${FLUENTD_BUFFER_FLUSH_AT_SHUTDOWN:=true}
                retry_type ${FLUENTD_BUFFER_RETRY_TYPE:=exponential_backoff}
                retry_max_times ${FLUENTD_BUFFER_RETRY_MAX_TIMES:=10}
                retry_wait ${FLUENTD_BUFFER_RETRY_WAIT:=1s}
                retry_max_interval ${FLUENTD_BUFFER_RETRY_MAX_INTERVAL:=60s}
                chunk_limit_size ${FLUENTD_BUFFER_CHUNK_LIMIT_SIZE:=8M}
                total_limit_size ${FLUENTD_BUFFER_TOTAL_LIMIT_SIZE:=512MB}
                overflow_action ${FLUENTD_BUFFER_OVERFLOW_ACTION:=throw_exception}
                compress ${FLUENTD_BUFFER_COMPRESS:=text}
              </buffer>
              <endpoint>
                url "https://${FLUENT_OPENSEARCH_HOST}"
                region "${FLUENT_OPENSEARCH_REGION}"
                assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
                assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
                refresh_credentials_interval 48m
              </endpoint>
            </store>
          </match>

I have also tried to set the IAM role maxSessionDuration to 5h and refresh_credentials_interval to 5h, the default value.

It worked for a few minutes, then went back to the same problem. It's been more than 24h without indexing logs.

Some fluentd pods are logging this:

2023-06-30 12:54:35 +0000 [error]: #0 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=10 records=35 error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"XXXX\", :port=>443, :scheme=>\"https\"}): [403] {\"message\":\"The security token included in the request is expired\"}"
  2023-06-30 12:54:35 +0000 [error]: #0 suppressed same stacktrace
2023-06-30 12:59:40 +0000 [error]: #0 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=10 records=635 error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"XXXXX\", :port=>443, :scheme=>\"https\"}): [403] {\"message\":\"The security token included in the request is expired\"}"
  2023-06-30 12:59:40 +0000 [error]: #0 suppressed same stacktrace

And some pods are logging this:

level=error msg="Fluentd exited" error="exit status 1"
level=info msg=backoff delay=0s
level=info msg="backoff timer done" actual=28.33µs expected=0s
level=info msg="Fluentd started"
2023-06-30 12:55:12 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
/usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call': The requested DurationSeconds exceeds the MaxSessionDuration set for this role. (Aws::STS::Errors::ValidationError)
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/request.rb:72:in `send_request'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-sts/client.rb:1575:in `assume_role_with_web_identity'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:76:in `refresh'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/refreshing_credentials.rb:30:in `initialize'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:64:in `initialize'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `new'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `aws_credentials'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:351:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin.rb:187:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:110:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:99:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:99:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/out_copy.rb:39:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin.rb:187:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:132:in `add_match'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:74:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:64:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:64:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/label.rb:31:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:105:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:80:in `run_configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/supervisor.rb:731:in `run_supervisor'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/command/fluentd.rb:350:in `<top (required)>'
        from <internal:/usr/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/bin/fluentd:15:in `<top (required)>'
        from /usr/bin/fluentd:25:in `load'
        from /usr/bin/fluentd:25:in `<main>'

Since the fluent-operator doesn't pin the plugin's version: https://github.com/fluent/fluent-operator/blob/master/cmd/fluent-watcher/fluentd/base/Dockerfile#L43

I can't even rollback the plugin's version to the previous one that worked. Locked in v1.1.1. The whole logging-system affected.

fluent / fluent-plugin-opensearch