fluent / fluent-plugin-opensearch

OpenSearch Plugin for Fluentd
Apache License 2.0
58 stars 20 forks source link

Error encountered with `refresh_credentials_interval` configuration option in fluent-plugin-opensearch v1.1.1 #107

Closed fabio-viana closed 1 year ago

fabio-viana commented 1 year ago

Expected Behavior or What you need to ask

When using the refresh_credentials_interval configuration option, the specified value does not take effect in the underlying AWS SDK. As a result, an error is consistently encountered: The requested DurationSeconds exceeds the MaxSessionDuration set for this role (Aws::STS::Errors::ValidationError)

Additional Information

Reverting back to the previous version of fluent-plugin-opensearch resolves the issue. The role used has a maximum session duration of 1 hour. Various refresh_credentials_interval values, including the minimum allowed (e.g., "15m", "30m"), were tested without success.

Complete error logs message:

/usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call': The requested DurationSeconds exceeds the MaxSessionDuration set for this role. (Aws::STS::Errors::ValidationError)
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/request.rb:72:in `send_request'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-sts/client.rb:1575:in `assume_role_with_web_identity'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:76:in `refresh'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/refreshing_credentials.rb:30:in `initialize'
        from /usr/local/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:64:in `initialize'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `new'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `aws_credentials'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:351:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin.rb:187:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:132:in `add_match'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:74:in `block in configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:64:in `each'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/agent.rb:64:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/root_agent.rb:149:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/engine.rb:105:in `configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/engine.rb:80:in `run_configure'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/supervisor.rb:571:in `run_supervisor'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
        from <internal:/usr/local/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/local/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/bin/fluentd:15:in `<top (required)>'
        from /usr/local/bundle/bin/fluentd:25:in `load'
        from /usr/local/bundle/bin/fluentd:25:in `<main>'

Using Fluentd and OpenSearch plugin versions

kaiohenricunha commented 1 year ago

Same here.

My Fluentd pod uses custom IAM roles for Service Accounts. This role's maxSessionDuration is currently set to 1h:

apiVersion: ...
kind: IRSA
metadata:
  name: fluentd-os-test
  namespace: fluent-system
  annotations:
    XXXX: managed
spec:
  serviceAccount: fluentd
  path: ${IRSA_ROLE_PATH:=/XXX/}
  # increasing this to sync with the fluent-plugin-opensearch latest update: https://github.com/fluent/fluent-plugin-opensearch/pull/78/files
  # it set the default fluentd session duration to 5 hours
  # our default maxSessionDuration was 1 hour, now it is 5 hours
  maxSessionDuration: 3600 # 1 hour
  inlinePolicy: |
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "...",
          "Action": "...",
          "Resource": "..."
        }
      ]
    }

And the Fluentd ClusterOutput is set to refresh_credentials_interval 48m:

apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
  name: opensearch
  labels:
    output.fluentd.fluent.io/enabled: "true"
    output.fluentd.fluent.io/tenant: "core"
spec:
  outputs:
    - customPlugin:
        config: |
          <match **>
            @type copy
            <store>
              @type opensearch
              host "${FLUENT_OPENSEARCH_HOST}"
              port 443
              logstash_format  true
              logstash_prefix logs-core
              scheme https
              log_os_400_reason true
              @log_level ${FLUENTD_OUTPUT_LOGLEVEL:=error}
              <buffer>
                @type ${FLUENTD_BUFFER_TYPE:=memory}
                path ${FLUENTD_BUFFER_PATH:=/buffers/opensearch/raas-core}
                flush_mode ${FLUENTD_BUFFER_FLUSH_MODE:=interval}
                flush_interval ${FLUENTD_BUFFER_FLUSH_INTERVAL:=60s}
                flush_thread_count ${FLUENTD_BUFFER_FLUSH_THREAD_COUNT:=2}
                flush_at_shutdown ${FLUENTD_BUFFER_FLUSH_AT_SHUTDOWN:=true}
                retry_type ${FLUENTD_BUFFER_RETRY_TYPE:=exponential_backoff}
                retry_max_times ${FLUENTD_BUFFER_RETRY_MAX_TIMES:=10}
                retry_wait ${FLUENTD_BUFFER_RETRY_WAIT:=1s}
                retry_max_interval ${FLUENTD_BUFFER_RETRY_MAX_INTERVAL:=60s}
                chunk_limit_size ${FLUENTD_BUFFER_CHUNK_LIMIT_SIZE:=8M}
                total_limit_size ${FLUENTD_BUFFER_TOTAL_LIMIT_SIZE:=512MB}
                overflow_action ${FLUENTD_BUFFER_OVERFLOW_ACTION:=throw_exception}
                compress ${FLUENTD_BUFFER_COMPRESS:=text}
              </buffer>
              <endpoint>
                url "https://${FLUENT_OPENSEARCH_HOST}"
                region "${FLUENT_OPENSEARCH_REGION}"
                assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
                assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
                refresh_credentials_interval 48m
              </endpoint>
            </store>
          </match>

I have also tried to set the IAM role maxSessionDuration to 5h and refresh_credentials_interval to 5h, the default value.

It worked for a few minutes, then went back to the same problem. It's been more than 24h without indexing logs.

Some fluentd pods are logging this:

2023-06-30 12:54:35 +0000 [error]: #0 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=10 records=35 error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"XXXX\", :port=>443, :scheme=>\"https\"}): [403] {\"message\":\"The security token included in the request is expired\"}"
  2023-06-30 12:54:35 +0000 [error]: #0 suppressed same stacktrace
2023-06-30 12:59:40 +0000 [error]: #0 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=10 records=635 error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"XXXXX\", :port=>443, :scheme=>\"https\"}): [403] {\"message\":\"The security token included in the request is expired\"}"
  2023-06-30 12:59:40 +0000 [error]: #0 suppressed same stacktrace

And some pods are logging this:

level=error msg="Fluentd exited" error="exit status 1"
level=info msg=backoff delay=0s
level=info msg="backoff timer done" actual=28.33µs expected=0s
level=info msg="Fluentd started"
2023-06-30 12:55:12 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
/usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/raise_response_errors.rb:17:in `call': The requested DurationSeconds exceeds the MaxSessionDuration set for this role. (Aws::STS::Errors::ValidationError)
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/request_callback.rb:71:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/seahorse/client/request.rb:72:in `send_request'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-sts/client.rb:1575:in `assume_role_with_web_identity'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:76:in `refresh'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/refreshing_credentials.rb:30:in `initialize'
        from /usr/lib/ruby/gems/3.1.0/gems/aws-sdk-core-3.175.0/lib/aws-sdk-core/assume_role_web_identity_credentials.rb:64:in `initialize'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `new'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:249:in `aws_credentials'
        from /usr/lib/ruby/gems/3.1.0/gems/fluent-plugin-opensearch-1.1.1/lib/fluent/plugin/out_opensearch.rb:351:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin.rb:187:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:110:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:99:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/multi_output.rb:99:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin/out_copy.rb:39:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/plugin.rb:187:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:132:in `add_match'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:74:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:64:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/agent.rb:64:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/label.rb:31:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `block in configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `each'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:146:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:105:in `configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:80:in `run_configure'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/supervisor.rb:731:in `run_supervisor'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/lib/fluent/command/fluentd.rb:350:in `<top (required)>'
        from <internal:/usr/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.15.3/bin/fluentd:15:in `<top (required)>'
        from /usr/bin/fluentd:25:in `load'
        from /usr/bin/fluentd:25:in `<main>'

Since the fluent-operator doesn't pin the plugin's version: https://github.com/fluent/fluent-operator/blob/master/cmd/fluent-watcher/fluentd/base/Dockerfile#L43

I can't even rollback the plugin's version to the previous one that worked. Locked in v1.1.1. The whole logging-system affected.

leonardolacerdaatlantico commented 1 year ago

I'm experiencing the same issue mentioned above, and it's significantly impacting my environments. It's crucial that this bug gets resolved as quickly as possible since it's directly affecting the project quality.

cosmo0920 commented 1 year ago

Hi thanks for your reports. I reverted the passing duration second behavior in v1.1.3.

fabio-viana commented 1 year ago

Thank you, the problem is fixed with the new release.