Open lilith opened 1 year ago
Here's a repro project: https://github.com/lilith/aws-pauses-exponentially
All this does is (a) AddAWSService
Hi @lilith,
Good afternoon.
Thanks for reporting the issue. Upon further analysis of SDK code, EC2InstanceMetadata.PauseExponentially() is the exponential retry option in cases where credentials could not be fetched from EC2 instance metadata service. I do see in your code that you are using Credentials = new AnonymousAWSCredentials()
while configuring default AWS options. When you run your code locally, credentials are resolved per resolution order mentioned in Credential and profile resolution. In case SDK is unable to find credentials in the order, it reaches the last generator, which is EC2 instance metadata. Since your local machine might not be an EC2 instance, it ultimately fails, while trying exponential retry backoff. For credential resolution via EC2 instance metadata, the exponential backoff is determined by the following constants defined in the code:
DEFAULT_RETRIES = 3,
MIN_PAUSE_MS = 250,
MAX_RETRIES = 3,
DEFAULT_APITOKEN_TTL = 21600;
So in total the credential resolution in your case might be taking 30 seconds.
Could you please advise how credentials are configured for your application?
Thanks, Ashish
I don't want to use credentials. I want to make anonymous requests to public buckets and blobs. How, other than using AnonymousAWSCredentials, can I accomplish this? I also do not want to have 30 seconds of downtime each deploy.
Also, this is library code, so I cannot require my end users to specify environment variables or files.
EC2InstanceMetadata.get_Region()
@lilith My apologies, I think I misread the call stack. Looks like it is trying to resolve the region from EC2InstanceMetadata
based on the stack:
Thread.Sleep()
EC2InstanceMetadata.PauseExponentially()
EC2InstanceMetadata.FetchApiToken()
EC2InstanceMetadata.GetItems()
EC2InstanceMetadata.GetItems()
EC2InstanceMetadata.GetData()
EC2InstanceMetadata.GetData()
EC2InstanceMetadata.get_IdentityDocument()
EC2InstanceMetadata.get_Region()
new InstanceProfileAWSRegion()
FallbackRegionFactory.<>c.<Reset>b__12_3()
FallbackRegionFactory.GetRegionEndpoint()
FallbackRegionFactory.GetRegionEndpoint()
ClientConfig.GetDefaultRegionEndpoint()
ClientConfig.get_RegionEndpoint()
ClientConfig.get_DefaultConfiguration()
ClientConfig.get_DefaultConfigurationMode()
ClientFactory.CreateConfig()
ClientFactory.CreateServiceClient()
I guess it is using the anonymous credentials per your scenario to access public buckets and is working fine.
Based on the call stack, EC2 instance metadata class:
/instance-identity/document
exposed via metadata endpoint.DEFAULT_RETRIES
value 3
to retry exponentially.On you local development, this process would fail since it would retry 3 times, pausing exponentially in between.
Please advise how is region configured in your application? In order to avoid region resolution, the possible workaround would be to configure region for S3 client based on region resolution order.
Thanks, Ashish
I created a repository and whittled a reproducible project down for you as mentioned earlier:
https://github.com/lilith/aws-pauses-exponentially/blob/main/Startup.cs#L50-L64
services.AddAWSService<IAmazonS3>(new AWSOptions
{
Credentials = new AnonymousAWSCredentials(),
Region = RegionEndpoint.USEast1
});
Region is being set in AWSOptions.
As an aside, I never personally use EC2 and would love to turn off all that nonsense for my personal apps. For the purposes of the Nuget libraries, I need a way to provide users with S3 integration without killing their uptime.
I created a repository and whittled a reproducible project down for you as mentioned earlier:
https://github.com/lilith/aws-pauses-exponentially/blob/main/Startup.cs#L50-L64
services.AddAWSService<IAmazonS3>(new AWSOptions { Credentials = new AnonymousAWSCredentials(), Region = RegionEndpoint.USEast1 });
Region is being set in AWSOptions.
As an aside, I never personally use EC2 and would love to turn off all that nonsense for my personal apps. For the purposes of the Nuget libraries, I need a way to provide users with S3 integration without killing their uptime.
@lilith Thanks for your response. I will investigate to determine what is happening and provide you updates accordingly.
Also, I think it's really not a good choice to have network calls occur synchronously during application startup, for (a) speed, (b) reliability, and (c) since this behavior is completely hidden from users. That kind of dangerous behavior should require explicit opt-in. Fail fast by default, please.
You can set the AWS_EC2_METADATA_DISABLED
environment variable to false
to disable the EC2 instance metadata lookup.
Yes, but I'm shipping libraries not applications.
And when AWSSDK.dll causes issues in my libs, they get filed on my repos, not yours.
Ideally, I'd like there to be a way - via C# - to make the sdk well-behaved.
Interesting adding an S3 client with Anonymous credentials to the DI as a library. Hope your use cases won't interfere with other cases an application might be relying on an IAmazonS3
in the DI with credentials.
Thanks for the repo code. I stepped through the code and it looks like the SDK is getting tripped up determining the DefaultConfigurationMode
which goes to EC2 instance metadata. We should figure out how to prevent that from happening in use cases like yours.
For a work around if you explicitly set the DefaultConfigurationMode
like below that fixed the slowness for me.
services.AddAWSService<IAmazonS3>(new AWSOptions
{
Credentials = new AnonymousAWSCredentials(),
Region = RegionEndpoint.USEast1,
DefaultConfigurationMode = DefaultConfigurationMode.Standard
});
No, in this instance the library consumer is responsible for calling .AddAWSService(), but there are cases where an anonymous client is needed in a library (direct, not DI) and those cases had a similar issue.
Thank you for the workaround, I'll push this change to the many example projects and docs that use .AddAWSService()
I noticed the docs mention that behavior of DefaultConfigurationMode.Standard may change over time. Is there an option that can explicitly opt-out of high-latency behavior permanently?
Not sure what you mean by high-latency behavior in this case. Are you talking about the bug that is triggering the slowness creating service clients that you reported? Under the covers DefaultConfigurationMode
is always resolved, but in your case there is a bug that it should not trigger an EC2 metadata lookup. So the only way to avoid that is skip the auto resolution and explicit set it.
I should have been more clear and mentioned the actual default used by the SDK is Legacy
if no environment factors, like environment variables, cause us to detect a different mode. Legacy
won't change as long we are on this major version of the SDK. But the settings for Standard
are more modern especially for retry behavior.
For example, if I wish to ensure that configuration will not be acquired over the network - that it will come from env vars or other locally, instantly accessible data stores, what would I set? What if I only want to all configuration to be loaded via code?
The actual configuration is not acquired over the network, the SDK looks at the environment to figure out which of the modes to use. For S3 you can see here where the configurations are defined. https://github.com/aws/aws-sdk-net/blob/82ec756de374a6a2d0dce55df00b537a6929a577/sdk/src/Services/S3/Generated/AmazonS3DefaultConfiguration.cs#L54
@lilith Please advise if the inputs provided by @normj provides necessary guidance/workaround, and if we are good to close the issue.
As I understand it, there is an open bug here. I don't understand why this would be closed.
On Tue, Feb 28, 2023, 11:10 AM Ashish Dhingra @.***> wrote:
@lilith https://github.com/lilith Please advise if the inputs provided by @normj https://github.com/normj provides necessary guidance/workaround, and if we are good to close the issue.
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-net/issues/2546#issuecomment-1448643401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2LHY7JMWYHKLXJ4UVIZTWZY5QLANCNFSM6AAAAAAVGOUHXM . You are receiving this because you were mentioned.Message ID: @.***>
Just for reference, when the region is explicitly set, in default configuration mode it is still probing for region, ultimately failing at EC2 instance metadata. In this case, implicit default configuration mode should not do region lookup. In this case, user is not on an EC2 instance.
Describe the bug
On a local Windows debug build, the AWSSDK pauses startup for a very long time with the following call stack:
Prior to asking the DI container for the service, it is registered like so:
It is then accessed
Expected Behavior
I would expect that the SDK would not pause exponentially on my developer machine (or in production)
Current Behavior
A 30 second delay
Reproduction Steps
See above
Possible Solution
No response
Additional Information/Context
It appears that this mess is caused by AWS trying to get the region endpoint, and failing, despite it being explicitly configured in code.
Here's another stack trace:
AWS .NET SDK and/or Package version used
AWSSDK.S3 3.7.103.15 and AWSSDK.Extensions.NETCore.Setup 3.7.4
Targeted .NET Platform
.NET 7
Operating System and version
Windows 11