Open martincostello opened 10 months ago
Doing a bit more digging, this might be a case of more of a docs/footgun issue than a bug per-se.
Digging around some more, I see here that when not using native AoT, the client is created with explicit anonymous credentials:
However, in my repro, I'm just calling the constructor with a region, it calls back into FallbackCredentialsFactory
, which is what I traced through in a decompiler to get to the minimal repro in the first place:
I haven't tried changing my code yet to do the same, but it raises a few points
1) From the original error message, it isn't clear that the user needs to do this to prevent a loop when default/fallback credentials are used. It seems like an implementation detail the average user wouldn't know of or find. 2) Without me digging more in a debugger, if this is the case, I would have thought it would have caused a StackOverflowException rather than hang (unless the runtime is just failing to show that, so it appears to be hanging).
If this does turn out to be what is happening, then I guess the bug is less a deadlock, and more a need for clearer documentation on what the user needs to do in this case.
Yep, that's what it was.
If I change the code to this, the problem goes away:
using Amazon.Runtime;
using Amazon.RuntimeDependencies;
using Amazon.SecurityToken;
Console.WriteLine("GlobalRuntimeDependencyRegistry.RegisterSecurityTokenServiceClient()");
GlobalRuntimeDependencyRegistry.Instance.RegisterSecurityTokenServiceClient(
- (_) =>
- new AmazonSecurityTokenServiceClient());
+ (context) =>
+ new AmazonSecurityTokenServiceClient(
+ new AnonymousAWSCredentials(),
+ context.SecurityTokenServiceClientContextData.Region));
Console.WriteLine("AssumeRoleWithWebIdentityCredentials.FromEnvironmentVariables()");
var credentials = AssumeRoleWithWebIdentityCredentials.FromEnvironmentVariables();
Console.WriteLine("AssumeRoleWithWebIdentityCredentials.GetCredentialsAsync()");
_ = await credentials.GetCredentialsAsync();
Console.WriteLine("Credentials obtained");
Maybe it would help give a hint for the user in this case if SecurityTokenServiceClientContext
included a property for the credentials to use, which could be set to an instance of AnonymousAWSCredentials
?
public class SecurityTokenServiceClientContext
{
public enum ActionContext { AssumeRoleAWSCredentials, AssumeRoleWithWebIdentityCredentials, FederatedAWSCredentials };
public ActionContext Action { get; set; }
public RegionEndpoint Region { get; set; }
public IWebProxy ProxySettings { get; set; }
+ public AWSCredentials Credentials { get; set; } = new AnonymousAWSCredentials();
}
The STS dependency is the hardest one to deal with in the SDK for AOT or more specifically now the GlobalRuntimeDependencyRegistry. Even in the assume role case, which requires AWS credentials and not the anonymous credentials, you want to make sure your not just using the default credentials because that would go back to the profile that needs to assume a role instead of the profile that should make the STS assume role call.
I agree this area needs some smoothing out and at the very least better documentation.
If you are using code that gets the role from the running instance which has role assumed in its execution context like Fargate or Lambda (which I believe is using assume role?), is the current correct action to keep <Assembly Name="AWSSDK.SecurityToken" Dynamic="Required All"></Assembly>
in the rd.xml
file? Or is there another way to use AssumeRoleAWSCredentials
without hardcoding the role name itself?
Side note, it's crazy to me that this GitHub issue is the only reference found on Google for RegisterSecurityTokenServiceClient
.
Describe the bug
While doing research into publishing a .NET 8 application using native AoT to EKS, I found at deploy-time that the application would fail on a code path we have in an internal library where the AWS SDK would attempt to get credentials using STS. The exception would be the following:
I added the code mentioned to the application and redeployed. At this point it appeared to no longer be responding and crash-looping.
Through various trial and error, I've distilled the problematic code path down to the code snippet below.
My hunch is that the code path is somehow re-entrantly trying to access this lock, causing it to block indefinitely:
https://github.com/aws/aws-sdk-net/blob/f7a3a70a75ff1c4d5bc877b10a03e836cfcbe32a/sdk/src/Core/Amazon.Runtime/Credentials/RefreshingAWSCredentials.cs#L144-L146
That's just a hunch - the issue could be somewhere completely different, but it's the first bit of synchronization I could find on the method call that is blocking. It also blocks using the synchronous version,
GetCredentials()
, so I'm pretty sure it's not something to do with sync-over-async.The code appears to deadlock regardless of whether native AoT is actually used at runtime or not.
Expected Behavior
The operation does not block.
With the repro, the following messages should be printed to the console:
Current Behavior
The application blocks on the call to
AssumeRoleWithWebIdentityCredentials.GetCredentialsAsync()
.Reproduction Steps
Run the following code within EKS where the instance credentials are available.
Possible Solution
No response
Additional Information/Context
No response
AWS .NET SDK and/or Package version used
AWSSDK.SecurityToken 3.7.300.38
Targeted .NET Platform
.NET 8
Operating System and version
Ubuntu 22.04.3 LTS