aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.06k stars 855 forks source link

IsUserCryptoAvailable check slows down Lambda cold start and takes large part of the initialization of Config and ServiceClients #2537

Closed unchartedxx closed 5 months ago

unchartedxx commented 1 year ago

Describe the bug

I'm using the AWS SDK to build a lambda function on .NET 7 in AOT mode. I noticed that during Dependency Injection the creation of AmazonLambdaConfig and AmazonLambdaClient objects was taking the majority of the time of the whole startup. I enabled the SDK logs (AWSConfigs.LoggingConfig.LogTo = LoggingOptions.Console;) and I noticed the following messages (emphasis on the second line):

2023-02-08T12:45:05.362Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    dbug    Start DI injection
2023-02-08T12:45:05.484Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    UserCrypto 1|2023-02-08T12:45:05.483Z|INFO|UserCrypto is not supported.  This may be due to use of a non-Windows operating system or Windows Nano Server, or the current user account may not have its profile loaded. Unable to load shared library 'Crypt32.dll' or one of its dependencies. In order to help diagnose loading problems, consider using a tool like strace. If you're using glibc, consider setting the LD_DEBUG environment variable: 
Crypt32.dll.so: cannot open shared object file: No such file or directory
libCrypt32.dll.so: cannot open shared object file: No such file or directory
Crypt32.dll: cannot open shared object file: No such file or directory
libCrypt32.dll: cannot open shared object file: No such file or directory
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariablesAWSCredentials 2|2023-02-08T12:45:05.503Z|INFO|Credentials found using environment variables.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableAWSRegion 3|2023-02-08T12:45:05.504Z|INFO|Region found using environment variable.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 4|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_ENABLE_ENDPOINT_DISCOVERY was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 5|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_MAX_ATTEMPTS was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 6|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_RETRY_MODE was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 7|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_EC2_METADATA_SERVICE_ENDPOINT was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 8|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_EC2_METADATA_SERVICE_ENDPOINT_MODE was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 9|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_USE_DUALSTACK_ENDPOINT was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    EnvironmentVariableInternalConfiguration 10|2023-02-08T12:45:05.504Z|INFO|The environment variable AWS_USE_FIPS_ENDPOINT was not set with a value.
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    ProfileInternalConfiguration 11|2023-02-08T12:45:05.504Z|INFO|Unable to find a profile named 'default' in store Amazon.Runtime.CredentialManagement.CredentialProfileStoreChain
2023-02-08T12:45:05.504Z    d3d66c80-d0f0-4a14-9ccc-4fd03e6567c8    info    DefaultConfigurationProvider 12|2023-02-08T12:45:05.504Z|INFO|Resolved DefaultConfigurationMode for RegionEndpoint [eu-west-1] to [Legacy].

That UserCrypto message is generated at IsUserCryptAvailable. That method is called whenever downstream it needs to try to get profile information if possible in CredentialProfileStoreChain. The get profile method is called indirectly whenever it needs to find credentials or profile configuration such as this or even this.

I want to note that all of this is not specific to the AmazonLambdaClient so I suppose it would affect all AWSSDK service clients.

Normally creating the Config and the Client without doing anything spacial takes between 150 and 200ms. If I call UserCrypto.IsUserCryptoAvailable in my code it takes around ~140 ms. If I create Config and Client after that it takes a few tens of milliseconds to do the initialisation. Therefore I can conclude that the single call to UserCrypto is taking the majority of the time even considering all the time to obtain credentials following the chain of folders, envvars,...

The test was done on .NET 7 and AOT. I can only assume it would take much more on earlier versions and/or non-AOT mode. The lambda was set with 256MB.

Expected Behavior

The AWS SDK notices it is running on a non-Windows enviroment and just return false to "UserCrypto.IsUserCryptAvailable" doesn't even try to call UserCrypto API and fail when the dll are not there.

Current Behavior

"UserCrypto.IsUserCryptAvailable" calls "Decrypt" that calls "CryptProtectData" that is a Windows only API but it tries anyway for the presence of the Crypt32.dll.so file and then fails.

Reproduction Steps

AWSConfigs.LoggingConfig.LogTo = LoggingOptions.Console; var clientConfig = new AmazonLambdaConfig(); var client = new AmazonLambdaClient(clientConfig);

Possible Solution

Checks for current operating system in "UserCrypto.IsUserCryptAvailable". Return false if not Windows.

Additional Information/Context

The only workaround ATM to avoid any possible call to IsUserCryptoAvailable is to set a lot of stuff ahead of time before creating the ClientConfig and the ServiceConfig:

// Avoid checking the profile for CSM
Environment.SetEnvironmentVariable("AWS_CSM_ENABLED", "false");

// Avoid checking the profile first for credentials
var access_key = Environment.GetEnvironmentVariable("AWS_ACCESS_KEY_ID");
var secret_key = Environment.GetEnvironmentVariable("AWS_SECRET_ACCESS_KEY");
var session_token = Environment.GetEnvironmentVariable("AWS_SESSION_TOKEN");

var credentials = new SessionAWSCredentials(access_key, secret_key, session_token);

// Avoid checking the profile for region
var region = RegionEndpoint.GetBySystemName(Environment.GetEnvironmentVariable("AWS_REGION"));

var clientConfig = new AmazonLambdaConfig
{
    DefaultConfigurationMode = DefaultConfigurationMode.InRegion, // skips certain profile checks
    RegionEndpoint = region,
    DisableLogging= false,

    // The following are the default values, also set to avoid fallback searches into the profile
    RetryMode = RequestRetryMode.Standard, // copied from InRegion                    
    MaxErrorRetry = 2,
    UseDualstackEndpoint = false,
    UseFIPSEndpoint = false,
    EndpointDiscoveryEnabled = false,
};
var client = new AmazonLambdaClient(credentials, clientConfig);

AWS .NET SDK and/or Package version used

AWSSDK.Lambda 3.7.104.7

Targeted .NET Platform

.NET 7 (AOT)

Operating System and version

Lambda (Amazon Linux 2)

ashishdhingra commented 1 year ago

@unchartedxx Good morning. I understand that the call to UserCrypto.IsUserCryptAvailable might be calling few milliseconds of delay (140ms), could you please share the statistics when disabling use of UserCrypto as assumed in the code? Is it substantially less than 200ms. Kindly note that there is an existing issue https://github.com/aws/aws-sdk-net/issues/1931 to address it on non-Windows environment, but there is no specific timeline yet to address it.

Thanks, Ashish

unchartedxx commented 1 year ago

@ashishdhingra thanks for the quick answer. I did some more tests hoping that it would clarify your doubts.

I redid the test using the official AWS .NET lambda blueprints from here. I only added code to measure the impact of the UserCrypto.IsUserCryptAvailable by calling it before the client initialization.

I tested with .NET 6 using the runtime provided by the lambda service and also using the custom runtime using .NET 7 in AOT mode. The Lambda function was assigned with 256MB. Each situation was tested three times, each time forcing a cold start. All times in milliseconds.

.NET 6 Only Client Initialization

Client Init
322
358
341

.NET 6 Calling UserCrypto.IsUserCryptAvailable and then Client Initialization

IsUserCryptAvailable Client Init Total
80 279 359
80 279 359
98 243 341

.NET 7 AOT Only Client Initialization

Client Init

180 181 219

.NET 7 AOT Calling UserCrypto.IsUserCryptAvailable and then Client Initialization

IsUserCryptAvailable Client Init Total
160 20 180
158 22 180
159 20 179

It is possible to see that if the UserCrypto.IsUserCryptAvailable was a no-op it would exactly remove a chunk from the whole time dedicated to the initialization, otherwise the total time would not the same as the client init wihout "prewarming" the crypto. In case of .NET 6 it could save about 25% of the whole time, while on .NET 7 it is about 88%, so the large majority.

Here the whole code I used for reference:

public static string FunctionHandler(string input, ILambdaContext context)
{
    AWSConfigs.LoggingConfig.LogTo = LoggingOptions.Console;
    Stopwatch timer = new();
    timer.Start();

    if (Environment.GetEnvironmentVariable("USERCRYPTO_WARMUP") != null)
    {
        var value = UserCrypto.IsUserCryptAvailable;
        Console.WriteLine($"IsUserCryptAvailable took {timer.ElapsedMilliseconds} to get {value}");
        timer.Restart();
    }

    var client = new AmazonLambdaClient();

    Console.WriteLine($"AmazonLambdaClient took {timer.ElapsedMilliseconds} to get {client}");

    return input.ToUpper();
}
dscpinheiro commented 5 months ago

We just released an update (included in Core 3.7.303.23) where the SDK won't attempt to load Crypt32.dll when not running on Windows.

It does require targeting .NET 8 (or greater), but should help with the latency you were seeing.

github-actions[bot] commented 5 months ago

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.