awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
3.03k stars 248 forks source link

Retry after HTTP2 GOAWAY errors #738

Closed ozgrakkurt closed 11 months ago

ozgrakkurt commented 1 year ago

Describe the bug

[2023-02-13T11:10:36Z WARN aws_smithy_client::hyper_ext] unrecognized error from Hyper. If this error should be retried, please file an issue. err=http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))

Expected Behavior

Should retry

Current Behavior

Error out

Reproduction Steps

none

Possible Solution

No response

Additional Information/Context

No response

Version

aws-smithy-client = "0.54.2"

Environment details (OS name and version, etc.)

docker ubuntu:latest

Logs

[2023-02-13T11:10:36Z WARN aws_smithy_client::hyper_ext] unrecognized error from Hyper. If this error should be retried, please file an issue. err=http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))

ozgrakkurt commented 1 year ago

https://github.com/golang/go/issues/18639

https://github.com/hyperium/h2/issues/83

Velfi commented 1 year ago

@ozgrakkurt Thanks for submitting this. Are you actually encountering this error in production right now?

ozgrakkurt commented 1 year ago

Hey, yes. We get this regularly

vac-adb commented 1 year ago

I can reproduce it easily with the following code. Note that 60s sleep duration between calls is important (there was no issue with 10s sleep).

Version aws-config = "0.55.1" aws-sdk-cognitoidentityprovider = "0.26.0"

use aws_sdk_cognitoidentityprovider as cognitoidentityprovider;
use tokio;
use std::thread;
use std::time::Duration;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let config = aws_config::load_from_env().await;
    let client = cognitoidentityprovider::Client::new(&config);
    let username = "fake_user";
    let password ="fake_password";
    let client_id = "fake_client";
    for i in 1..4 {
        let cognito_result = client.initiate_auth().set_client_id(Some(client_id.to_string())).auth_flow(cognitoidentityprovider::types::AuthFlowType::UserPasswordAuth).auth_parameters("USERNAME", username).auth_parameters("PASSWORD", password).send().await;
        println!("Result {:}: {:?}", i, cognito_result);
        thread::sleep(Duration::from_secs(60));
    }
}

Results:

  1. ✅ Err(ServiceError(ServiceError { ... { message: Some("User pool client fake_client does not exist.")
  2. Err(DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }), connection: Unknown } }))
  3. ✅ Err(ServiceError(ServiceError { ... { message: Some("User pool client fake_client does not exist.")
rcoh commented 1 year ago

ah perfect! thanks for the reproducer. it's a quick fix but I wasn't able to reproduce it in tests so I wasn't able to figure out if my fix actually worked. We'll get this prioritized and fixed—or we're happy to accept a PR, the relevant code is here: https://github.com/awslabs/smithy-rs/blob/main/rust-runtime/aws-smithy-client/src/hyper_ext.rs#LL207C1-L224C2

If we correctly classify that error is kind: Io it should be fixed

hxk1633 commented 1 year ago

I get a similar error when trying to invoke dynamodb in Rust AWS SDK: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: NoMatchingAuthScheme, connection: Unknown } }) Not sure what this error means...

rcoh commented 1 year ago

This is a different error—no matching auth scheme. Can you open a separate issue and include some logs?

On Tue, Sep 12, 2023, 11:04 AM Harrison Kaiser @.***> wrote:

I get a similar error when trying to invoke dynamodb in Rust AWS SDK: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: NoMatchingAuthScheme, connection: Unknown } }) Not sure what this error means...

— Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-sdk-rust/issues/738#issuecomment-1715903047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYKZ4SEETLOGKTYJ5O76DX2B2XRANCNFSM6AAAAAAU2EGWYA . You are receiving this because you commented.Message ID: @.***>

ferologics commented 1 year ago

we're seeing this issue daily in prod for aws-sdk-apigatewaymanagement::apigatewaymanagement_client.post_to_connection call (sending a websocket connection meesage):

Warning
14:33:15
API Gateway Management message send error with error: Other(ErrorString("connection_id: ConnectionId(\"OFT8NePuvHcCGqQ=\"), error: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(None), source: hyper::Error(Http2, Error { kind: GoAway(b\"\", NO_ERROR, Remote) }), connection: Unknown } }), request_id: 'OFVWzGGVPHcF98w='"))

Warning
14:33:15
aws_smithy_runtime::client::http::hyper_014
unrecognized error from Hyper. If this error should be retried, please file an issue.
{
err: http2 error: connection error received: not a result of an error: connection error received: not a result of an error (hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))
}
image
rcoh commented 11 months ago

It looks like the fix for this was merged but then accidentally removed as part of a merge. We'll get a fix out soon

jdisanti commented 11 months ago

The fix has been merged and will go out in a future release: https://github.com/smithy-lang/smithy-rs/pull/3250

ferologics commented 11 months ago

@jdisanti thx for the update. curiously, we stopped receiving this error, and are not sure what changed. one theory is that it was an AWS service bug that got fixed. we also wonder whether this error should be retried, after reading this particular post (we use ELB)

rcoh commented 11 months ago

Oh — this was fixed in the runtime libraries so if you ran cargo update you'd get the new library even without a new sdk version.

On Mon, Dec 4, 2023, 1:55 PM Fero @.***> wrote:

@jdisanti https://github.com/jdisanti thx for the update. curiously, we stopped receiving this error, and are not sure what changed. one theory is that it was an AWS service bug that got fixed. we also wonder whether this error should be retried, after reading this particular post https://stackoverflow.com/a/42682185 (we use ELB)

— Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-sdk-rust/issues/738#issuecomment-1839273329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYKZ3TDH2QPVVVFQGTGGLYHYMBNAVCNFSM6AAAAAAU2EGWYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGI3TGMZSHE . You are receiving this because you commented.Message ID: @.***>

ferologics commented 11 months ago

@rcoh we did run cargo update on two occasions, Nov 8th (bump from 0.56.1 to 0.57.1), and Nov 30th (bump from 0.57.1 to 1.0.1), so this is possible.

jdisanti commented 11 months ago

This fix for this went out in the December 5, 2023 release.

github-actions[bot] commented 11 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.