awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
3.02k stars 248 forks source link

AWS SDK seems to have failed to poison S3 connections after today's outage #827

Closed benesch closed 1 year ago

benesch commented 1 year ago

Describe the bug

At @MaterializeInc we run a number of clusterd processes that continually read and write from S3. (We make a streaming database that's in the business of reading data from S3, transforming it, and writing it back to S3.)

During today's AWS outage, these clusterd processes all experienced an outage. Almost all of the clusterd processes recovered, except for two that continually produced error messages like the following:

clusterd no loader was set :-/
clusterd {"timestamp":"2023-06-14T02:21:42.407205Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.407274Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd no loader was set :-/
clusterd no loader was set :-/
clusterd {"timestamp":"2023-06-14T02:21:42.407345Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.407427Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.409267Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.409623Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.413165Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.413439Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.416520Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.416801Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.473353Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.473688Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd no loader was set :-/
clusterd {"timestamp":"2023-06-14T02:21:42.475366Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.475470Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd no loader was set :-/
clusterd no loader was set :-/
clusterd {"timestamp":"2023-06-14T02:21:42.475532Z","level":"INFO","fields":{"message":"external operation rollup::set failed, retrying in 16s: indeterminate: request has timed out"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.476895Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.477315Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-14T02:21:42.480570Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-14T02:21:42.480894Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 16s: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}

AIUI, rollup::set is attempting to write to S3, while s3 get meta is attempting to read from S3. The following error message (request has timed out) or (failed to construct request) comes straight from the AWS SDK, AFAICT.

Expected Behavior

We expected our S3 reads/writes to eventually succeed on retry.

Current Behavior

The S3 reads/writes continued failing forever.

We validated that the two affected containers did in fact have access to S3—logging in to the containers and manually issuing S3 requests was successful at the time shown in the S3 logs.

Reproduction Steps

So sorry, but we have no reproduction of this. We run chaos tests in our CI that interrupt network connections but we've never seen anything like this. I think it's unlikely we'll see this again until another wide AWS outage.

I wonder if it's something specific about interrupting the IAM connections in the way that the AWS outage did. We don't test that chaos extensively in our CI.

Possible Solution

The reason I wanted to file this issue is because of this println that we're seeing in the output:

https://github.com/awslabs/smithy-rs/blob/312d190535b1c77625d662d18313b90af64cb448/rust-runtime/aws-smithy-http/src/connection.rs#L85

This looks like a stray debugging println. It was added in https://github.com/awslabs/smithy-rs/pull/2445. I'm just spitballing, but I'm wondering if this println is related to the issue. Perhaps connections in this process weren't getting poisoned properly because the connection metadata wasn't available?

I've never seen this println in our logs while debugging before. Unfortunately I can't say that we've truly never seen this log on the unaffected processes because as a println rather than a tracing log it doesn't get picked up by our logging infrastructure. It's possible that this println is actually a normal occurrence when there are S3 connectivity issues, and not indicative of a failure to poison broken connections.

If nothing else, seems like the debugging println ought to be removed!

Additional Information/Context

No response

Version

│       ├── aws-credential-types v0.55.1
│       │   ├── aws-smithy-async v0.55.2
│       │   ├── aws-smithy-types v0.55.2
│       ├── aws-sdk-sts v0.26.0
│       │   ├── aws-credential-types v0.55.1 (*)
│       │   ├── aws-endpoint v0.55.1
│       │   │   ├── aws-smithy-http v0.55.2
│       │   │   │   ├── aws-smithy-eventstream v0.55.2
│       │   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   ├── aws-types v0.55.1
│       │   │   │   ├── aws-credential-types v0.55.1 (*)
│       │   │   │   ├── aws-smithy-async v0.55.2 (*)
│       │   │   │   ├── aws-smithy-client v0.55.2
│       │   │   │   │   ├── aws-smithy-async v0.55.2 (*)
│       │   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   │   │   ├── aws-smithy-http-tower v0.55.2
│       │   │   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   ├── aws-http v0.55.1
│       │   │   ├── aws-credential-types v0.55.1 (*)
│       │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   │   ├── aws-types v0.55.1 (*)
│       │   ├── aws-sig-auth v0.55.1
│       │   │   ├── aws-credential-types v0.55.1 (*)
│       │   │   ├── aws-sigv4 v0.55.1
│       │   │   │   ├── aws-smithy-eventstream v0.55.2 (*)
│       │   │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   ├── aws-smithy-eventstream v0.55.2 (*)
│       │   │   ├── aws-smithy-http v0.55.2 (*)
│       │   │   ├── aws-types v0.55.1 (*)
│       │   ├── aws-smithy-async v0.55.2 (*)
│       │   ├── aws-smithy-client v0.55.2 (*)
│       │   ├── aws-smithy-http v0.55.2 (*)
│       │   ├── aws-smithy-http-tower v0.55.2 (*)
│       │   ├── aws-smithy-json v0.55.2
│       │   │   └── aws-smithy-types v0.55.2 (*)
│       │   ├── aws-smithy-query v0.55.2
│       │   │   ├── aws-smithy-types v0.55.2 (*)
│       │   ├── aws-smithy-types v0.55.2 (*)
│       │   ├── aws-smithy-xml v0.55.2
│       │   ├── aws-types v0.55.1 (*)
│       ├── aws-sig-auth v0.55.1 (*)
│       ├── aws-sigv4 v0.55.1 (*)
│       ├── aws-smithy-http v0.55.2 (*)
│       ├── aws-credential-types v0.55.1 (*)
│       ├── aws-sdk-sts v0.26.0 (*)
│       ├── aws-sig-auth v0.55.1 (*)
│       ├── aws-sigv4 v0.55.1 (*)
│       ├── aws-smithy-http v0.55.2 (*)
│   │   │   ├── aws-config v0.55.1
│   │   │   │   ├── aws-credential-types v0.55.1 (*)
│   │   │   │   ├── aws-http v0.55.1 (*)
│   │   │   │   ├── aws-sdk-sso v0.26.0
│   │   │   │   │   ├── aws-credential-types v0.55.1 (*)
│   │   │   │   │   ├── aws-endpoint v0.55.1 (*)
│   │   │   │   │   ├── aws-http v0.55.1 (*)
│   │   │   │   │   ├── aws-sig-auth v0.55.1 (*)
│   │   │   │   │   ├── aws-smithy-async v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-client v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-http-tower v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-json v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│   │   │   │   │   ├── aws-types v0.55.1 (*)
│   │   │   │   ├── aws-sdk-sts v0.26.0 (*)
│   │   │   │   ├── aws-smithy-async v0.55.2 (*)
│   │   │   │   ├── aws-smithy-client v0.55.2 (*)
│   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│   │   │   │   ├── aws-smithy-http-tower v0.55.2 (*)
│   │   │   │   ├── aws-smithy-json v0.55.2 (*)
│   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│   │   │   │   ├── aws-types v0.55.1 (*)
│   │   │   ├── aws-credential-types v0.55.1 (*)
│   │   │   ├── aws-sdk-s3 v0.26.0
│   │   │   │   ├── aws-credential-types v0.55.1 (*)
│   │   │   │   ├── aws-endpoint v0.55.1 (*)
│   │   │   │   ├── aws-http v0.55.1 (*)
│   │   │   │   ├── aws-sig-auth v0.55.1 (*)
│   │   │   │   ├── aws-sigv4 v0.55.1 (*)
│   │   │   │   ├── aws-smithy-async v0.55.2 (*)
│   │   │   │   ├── aws-smithy-checksums v0.55.2
│   │   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│   │   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│   │   │   │   ├── aws-smithy-client v0.55.2 (*)
│   │   │   │   ├── aws-smithy-eventstream v0.55.2 (*)
│   │   │   │   ├── aws-smithy-http v0.55.2 (*)
│   │   │   │   ├── aws-smithy-http-tower v0.55.2 (*)
│   │   │   │   ├── aws-smithy-json v0.55.2 (*)
│   │   │   │   ├── aws-smithy-types v0.55.2 (*)
│   │   │   │   ├── aws-smithy-xml v0.55.2 (*)
│   │   │   │   ├── aws-types v0.55.1 (*)
│   │   │   ├── aws-types v0.55.1 (*)
│   │   │   ├── mz-aws-s3-util v0.0.0 (/Users/benesch/Sites/materialize/materialize/src/aws-s3-util)
│   │   │   │   ├── aws-sdk-s3 v0.26.0 (*)
│   │   │   │   ├── aws-types v0.55.1 (*)
│   │   ├── aws-config v0.55.1 (*)
│   │   ├── aws-credential-types v0.55.1 (*)
│   │   ├── aws-types v0.55.1 (*)
│   ├── aws-sdk-sts v0.26.0 (*)
mz-aws-s3-util v0.0.0 (/Users/benesch/Sites/materialize/materialize/src/aws-s3-util) (*)
│   ├── mz-aws-s3-util v0.0.0 (/Users/benesch/Sites/materialize/materialize/src/aws-s3-util) (*)
├── aws-config v0.55.1 (*)
├── aws-sdk-s3 v0.26.0 (*)
├── mz-aws-s3-util v0.0.0 (/Users/benesch/Sites/materialize/materialize/src/aws-s3-util) (*)
├── aws-config v0.55.1 (*)
├── aws-credential-types v0.55.1 (*)
├── aws-sdk-sts v0.26.0 (*)
├── aws-types v0.55.1 (*)
├── mz-aws-s3-util v0.0.0 (/Users/benesch/Sites/materialize/materialize/src/aws-s3-util) (*)

Environment details (OS name and version, etc.)

Linux 5.10.178-162.673.amzn2.x86_64

Logs

No response

benesch commented 1 year ago

Eyeballing the output of some unaffected clusterd processes by hand, though, I'm really not seeing that debugging println on any of those unaffected processes. We see STS errors around 4pm ET (as expected) that continue for a bit and then clear right up:

clusterd {"timestamp":"2023-06-13T20:09:02.084937Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.774907Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.802298Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:02 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.802611Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:02 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.815222Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:03.383098Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.116729Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.117057Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.138912Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.228762Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:04 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.229078Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:04 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.229290Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 32ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.266399Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.290968Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.291272Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.291716Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 64ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.364927Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394375Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394685Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394893Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 128ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.533256Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.560945Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.561244Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.561451Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 256ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.813752Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.840466Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.840796Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n  <Error>\\n    <Type>Sender</Type>\\n    <Code>InvalidIdentityToken</Code>\\n    <Message>Couldn't retrieve verification key from your identity provider,  please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n  </Error>\\n  <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.841011Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 512ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:05.324239Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:05.363141Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 43.184745ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T21:09:29.655483Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T21:09:29.691265Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 39.945721ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T22:09:39.545786Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T22:09:39.714404Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 172.900715ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T23:07:49.226265Z","level":"INFO","fields":{"message":"poisoning connection: SmithyConnection { is_proxied: false, remote_addr: Some(54.231.134.18:443) }"},"target":"aws_smithy_client::poison"}
clusterd {"timestamp":"2023-06-13T23:07:49.226334Z","level":"INFO","fields":{"message":"smithy connection was poisoned"},"target":"aws_smithy_http::connection"}
ysaito1001 commented 1 year ago

Hi @benesch , thank you for reporting this issue & providing an analysis in the description. We'll add this to our backlog. @rcoh may have some insights into this. It's conceivable that the outage use case may have exposed a code execution path that was not originally considered when PR2445 was created. In the meantime, if you happen to discover a simpler reproduction step, kindly share it with us.

rcoh commented 1 year ago

Are you running with non-stock connectors of some kind? That could cause connection metadata to fail to work 🤔

benesch commented 1 year ago

Are you running with non-stock connectors of some kind? That could cause connection metadata to fail to work 🤔

Not as far as I know! This is our configuration: https://github.com/MaterializeInc/materialize/blob/bfbebe04a7d181f1bac91f01785309a52cf2de34/src/persist/src/s3.rs#L111-L180

In the meantime, if you happen to discover a simpler reproduction step, kindly share it with us.

Will do, but I'm afraid it'll be pretty unlikely that we do. To be honest, if the only outcome of this issue is the removal of the debugging println!, that's pretty much what I expect! I just figured I'd file in case it sparked insight for someone—e.g. about some multi-threaded credential connection cache or something like that.

rcoh commented 1 year ago

coming back here as I prepare to close this ticket—it seems like in most of your pods, connection poisoning worked as intended and through out the bad connections but in one pod maybe a race condition of some kind caused us to fail to get the poisoning to work. Since no loader was set, it means we never actually made it to even trying to send the request with Hyper—maybe the timeout hit during waiting for a retry or poll_ready on Hyper was pending?

In any case, the println has been replaced with a debug

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

benesch commented 1 year ago

Since no loader was set, it means we never actually made it to even trying to send the request with Hyper—maybe the timeout hit during waiting for a retry or poll_ready on Hyper was pending?

Yeah, could be. In any case, thanks for removing that println!!