Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.42k stars 4.8k forks source link

[BUG] Azure Keyvault GetSecret API timouts #46370

Closed ncoussemacq closed 2 weeks ago

ncoussemacq commented 3 weeks ago

Library name and version

Azure.Security.KeyVault.Secrets 4.6.0

Describe the bug

We are seeing 2 different scenarios with very GetSecret API calls randomly failing after a significant amount of time. This mostly happen when traffic is quite high.

Case 1 : GetSecret API that get cancelled after ~100s and the automated retry is succeeding right after in 23ms delay.

image

(operation Id 9b3dcb3caafc90843ef1b7612e240807)

Case 2 : GetSecret API takes #20s to return 401 error code and the automated retry is succeeding right after. I understand that the initial 401 is expected because of the authentication flow, but i'm surprised it takes ~20s to respsond.

image

(operation Id 0ac0c16170c7c1c6a7dc6a9a77425753)

This issue seem quite similar to 37420, that is Closed.

Expected behavior

GetSecret call does not fails after long timeout.

Actual behavior

Call to GetSecret randomly fails after many seconds

Reproduction Steps

Here is the source code of the class calling getSecret API.

public class KeyvaultSecretClient : IKeyvaultSecretClient { private static readonly ActivitySource ActivitySource = new ActivitySource(typeof(KeyvaultSecretClient).FullName!, "1.0.0"); private const string GetSecretActivityName = $"{nameof(KeyvaultSecretClient)}:{nameof(GetSecretAsync)}";

private static readonly Regex KeyvaultSecretUnauthorizedCharaters = new Regex("[^a-zA-Z0-9]");
private static readonly SecretClientOptions KeyvaultSecretClientOptions = new SecretClientOptions()
{
    Retry =
        {
            Delay= TimeSpan.FromSeconds(2),
            MaxDelay = TimeSpan.FromSeconds(16),
            MaxRetries = 5,
            Mode = RetryMode.Exponential
        }
};

private const string KeyvaultUrlPattern = "https://{0}.vault.azure.net";
private readonly DefaultAzureCredential _azureCredentials;
private readonly ILogger<KeyvaultSecretClient> _logger;

public KeyvaultSecretClient(
    DefaultAzureCredential azureCredentials,
    ILogger<KeyvaultSecretClient> logger)
{

    ArgumentNullException.ThrowIfNull(azureCredentials, nameof(azureCredentials));
    ArgumentNullException.ThrowIfNull(logger, nameof(logger));

    _azureCredentials = azureCredentials;
    _logger = logger;
}

public async Task<string> GetSecretAsync(string keyvaultName, string secretName)
{
    ArgumentNullException.ThrowIfNullOrEmpty(keyvaultName, nameof(keyvaultName));
    ArgumentNullException.ThrowIfNullOrEmpty(secretName, nameof(secretName));

    using var activity = ActivitySource.StartActivity(GetSecretActivityName);
    activity?.AddTag(nameof(keyvaultName), keyvaultName);
    activity?.AddTag(nameof(secretName), secretName);

    var keyvaultClient = GetKeyvaultClient(keyvaultName);

    var cleanedSecretName = CleanSecretName(secretName);
    activity?.AddTag(nameof(cleanedSecretName), cleanedSecretName);

    try
    { 
        var response = await keyvaultClient.GetSecretAsync(cleanedSecretName);

        if (response.Value == null)
        {
            activity?.SetStatus(ActivityStatusCode.Error);
            _logger.LogError($"Secret {secretName} is empty {keyvaultName}");

            throw new KeyvaultSecretNotFoundException($"Secret {cleanedSecretName} not found in keyvault {keyvaultName}");
        }

        activity?.AddTag("secretSize", response.Value.Value.Length);

        return response.Value.Value;
    }
    catch(RequestFailedException ex)
    {
        activity?.SetStatus(ActivityStatusCode.Error);
        _logger.LogError(ex, $"Failed to get secret {cleanedSecretName} from keyvault {keyvaultName}");

        throw new KeyvaultSecretNotFoundException($"Failed to get secret {cleanedSecretName} from keyvault {keyvaultName}", ex);
    }
}

private SecretClient GetKeyvaultClient(string keyvaultName)
{
    var keyvaultUri = string.Format(KeyvaultUrlPattern, keyvaultName);

    var keyvaultClient = new SecretClient(
        new Uri(keyvaultUri), 
        _azureCredentials,
        KeyvaultSecretClientOptions);

    return keyvaultClient;
}

private string CleanSecretName(string secretName)
{
    return KeyvaultSecretUnauthorizedCharaters.Replace(secretName, "-");
}

}

Environment

.net 8.0

github-actions[bot] commented 3 weeks ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire commented 3 weeks ago

Hi @ncoussemacq. Thanks for reaching out and we regret that you're experiencing difficulties. Based on the description and symptoms, the behavior that you're seeing is most likely related to your application or host environment. It sounds very much like you're either seeing continuations for async calls unable to be scheduled in a timely manner or seeing some form of network congestion. It is also possible that the service calls themselves are taking longer than expected.

Unfortunately, this is not something the that maintainers of the Azure SDK can assist with. We would suggest investigating the application patterns for async and the host resources as the first step. If you believe the service calls are potentially the cause, then your best path forward for would be to open an Azure support request and ask the service team to analyze service logs for that time period. If you would prefer not to open a support ticket, you may want to inquire on the Microsoft Q&A site as the service team also monitors that.

github-actions[bot] commented 3 weeks ago

Hi @ncoussemacq. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

kevinharing commented 3 weeks ago

Possibly related to #44817? What version of Azure.Core are you using?

github-actions[bot] commented 2 weeks ago

Hi @ncoussemacq, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.