Closed benesch closed 1 year ago
Eyeballing the output of some unaffected clusterd
processes by hand, though, I'm really not seeing that debugging println
on any of those unaffected processes. We see STS errors around 4pm ET (as expected) that continue for a bit and then clear right up:
clusterd {"timestamp":"2023-06-13T20:09:02.084937Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.774907Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.802298Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:02 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.802611Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:02 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:02.815222Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:03.383098Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.116729Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.117057Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.138912Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.228762Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:04 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.229078Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:04 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.229290Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 32ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.266399Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.290968Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.291272Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.291716Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 64ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.364927Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394375Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394685Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.394893Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 128ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.533256Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.560945Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.561244Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.561451Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 256ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:04.813752Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.840466Z","level":"WARN","fields":{"message":"STS returned an error assuming web identity role","error":"service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }))"},"target":"aws_config::web_identity_token","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.840796Z","level":"WARN","fields":{"message":"provider failed to provide credentials","provider":"WebIdentityToken","error":"an error occurred while loading credentials: service error: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements: InvalidIdentityTokenException: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements (ProviderError(ProviderError { source: ServiceError(ServiceError { source: InvalidIdentityTokenException(InvalidIdentityTokenException { message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), meta: ErrorMetadata { code: Some(\"InvalidIdentityToken\"), message: Some(\"Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\"), extras: Some({\"aws_request_id\": \" 00000000-0000-0000-0000-000000000000\"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {\"x-amzn-requestid\": \" 00000000-0000-0000-0000-000000000000\", \"content-type\": \"text/xml\", \"content-length\": \"390\", \"date\": \"Tue, 13 Jun 2023 20:09:03 GMT\", \"connection\": \"close\"}, body: SdkBody { inner: Once(Some(b\"<ErrorResponse xmlns=\\\"https://sts.amazonaws.com/doc/2011-06-15/\\\">\\n <Error>\\n <Type>Sender</Type>\\n <Code>InvalidIdentityToken</Code>\\n <Message>Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements</Message>\\n </Error>\\n <RequestId> 00000000-0000-0000-0000-000000000000</RequestId>\\n</ErrorResponse>\\n\")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } }) }))"},"target":"aws_config::meta::credentials::chain","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:04.841011Z","level":"INFO","fields":{"message":"external operation fetch_batch::get failed, retrying in 512ms: indeterminate: s3 get meta err: failed to construct request"},"target":"mz_persist_client::internal::machine"}
clusterd {"timestamp":"2023-06-13T20:09:05.324239Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T20:09:05.363141Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 43.184745ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T21:09:29.655483Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T21:09:29.691265Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 39.945721ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T22:09:39.545786Z","level":"INFO","fields":{"message":"credentials cache returned CredentialsNotLoaded, ignoring"},"target":"aws_http::auth","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T22:09:39.714404Z","level":"INFO","fields":{"message":"credentials cache miss occurred; added new AWS credentials (took 172.900715ms)"},"target":"aws_credential_types::cache::lazy_caching","span":{"name":"lazy_load_credentials"},"spans":[{"name":"lazy_load_credentials"}]}
clusterd {"timestamp":"2023-06-13T23:07:49.226265Z","level":"INFO","fields":{"message":"poisoning connection: SmithyConnection { is_proxied: false, remote_addr: Some(54.231.134.18:443) }"},"target":"aws_smithy_client::poison"}
clusterd {"timestamp":"2023-06-13T23:07:49.226334Z","level":"INFO","fields":{"message":"smithy connection was poisoned"},"target":"aws_smithy_http::connection"}
Hi @benesch , thank you for reporting this issue & providing an analysis in the description. We'll add this to our backlog. @rcoh may have some insights into this. It's conceivable that the outage use case may have exposed a code execution path that was not originally considered when PR2445 was created. In the meantime, if you happen to discover a simpler reproduction step, kindly share it with us.
Are you running with non-stock connectors of some kind? That could cause connection metadata to fail to work 🤔
Are you running with non-stock connectors of some kind? That could cause connection metadata to fail to work 🤔
Not as far as I know! This is our configuration: https://github.com/MaterializeInc/materialize/blob/bfbebe04a7d181f1bac91f01785309a52cf2de34/src/persist/src/s3.rs#L111-L180
In the meantime, if you happen to discover a simpler reproduction step, kindly share it with us.
Will do, but I'm afraid it'll be pretty unlikely that we do. To be honest, if the only outcome of this issue is the removal of the debugging println!
, that's pretty much what I expect! I just figured I'd file in case it sparked insight for someone—e.g. about some multi-threaded credential connection cache or something like that.
coming back here as I prepare to close this ticket—it seems like in most of your pods, connection poisoning worked as intended and through out the bad connections but in one pod maybe a race condition of some kind caused us to fail to get the poisoning to work. Since no loader was set, it means we never actually made it to even trying to send the request with Hyper—maybe the timeout hit during waiting for a retry or poll_ready
on Hyper was pending?
In any case, the println has been replaced with a debug
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Since no loader was set, it means we never actually made it to even trying to send the request with Hyper—maybe the timeout hit during waiting for a retry or
poll_ready
on Hyper was pending?
Yeah, could be. In any case, thanks for removing that println!
!
Describe the bug
At @MaterializeInc we run a number of
clusterd
processes that continually read and write from S3. (We make a streaming database that's in the business of reading data from S3, transforming it, and writing it back to S3.)During today's AWS outage, these
clusterd
processes all experienced an outage. Almost all of theclusterd
processes recovered, except for two that continually produced error messages like the following:AIUI,
rollup::set
is attempting to write to S3, whiles3 get meta
is attempting to read from S3. The following error message (request has timed out
) or (failed to construct request
) comes straight from the AWS SDK, AFAICT.Expected Behavior
We expected our S3 reads/writes to eventually succeed on retry.
Current Behavior
The S3 reads/writes continued failing forever.
We validated that the two affected containers did in fact have access to S3—logging in to the containers and manually issuing S3 requests was successful at the time shown in the S3 logs.
Reproduction Steps
So sorry, but we have no reproduction of this. We run chaos tests in our CI that interrupt network connections but we've never seen anything like this. I think it's unlikely we'll see this again until another wide AWS outage.
I wonder if it's something specific about interrupting the IAM connections in the way that the AWS outage did. We don't test that chaos extensively in our CI.
Possible Solution
The reason I wanted to file this issue is because of this
println
that we're seeing in the output:https://github.com/awslabs/smithy-rs/blob/312d190535b1c77625d662d18313b90af64cb448/rust-runtime/aws-smithy-http/src/connection.rs#L85
This looks like a stray debugging
println
. It was added in https://github.com/awslabs/smithy-rs/pull/2445. I'm just spitballing, but I'm wondering if this println is related to the issue. Perhaps connections in this process weren't getting poisoned properly because the connection metadata wasn't available?I've never seen this
println
in our logs while debugging before. Unfortunately I can't say that we've truly never seen this log on the unaffected processes because as aprintln
rather than atracing
log it doesn't get picked up by our logging infrastructure. It's possible that thisprintln
is actually a normal occurrence when there are S3 connectivity issues, and not indicative of a failure to poison broken connections.If nothing else, seems like the debugging
println
ought to be removed!Additional Information/Context
No response
Version
Environment details (OS name and version, etc.)
Linux 5.10.178-162.673.amzn2.x86_64
Logs
No response