HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.36k stars 1.7k forks source link

Azure SQL secured with Managed Identity credentials #1946

Open engenb opened 3 years ago

engenb commented 3 years ago

Hello! I am evaluating Hangfire for a project I am working on. I will be hosting my Hangfire server in an Azure environment with an Azure SQL database.

One of my requirements is that my server's connection to the database must be secured with Azure Managed Identity. I saw a few other issues logged related to managed identity security, so I'm hoping someone can point me in the right direction. I've included relevant code from my POC below:

config
    .SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
    .UseSimpleAssemblyNameTypeSerializer()
    .UseRecommendedSerializerSettings()
    .UseSqlServerStorage(
        () => GetHangfireDbConnection(services),
        new SqlServerStorageOptions
        {
            // these are recommended defaults that will become standard in 2.0
            // https://docs.hangfire.io/en/latest/configuration/using-sql-server.html#configuration
            CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
            SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
            QueuePollInterval = TimeSpan.Zero,
            UseRecommendedIsolationLevel = true,
            DisableGlobalLocks = true
        });
...
private static readonly TokenRequestContext AzureSqlTokenRequestContext = new(new[] { "https://database.windows.net/.default" });

private static DbConnection GetHangfireDbConnection(IServiceProvider services)
{
    var environment = services.GetRequiredService<IHostEnvironment>();
    var configuration = services.GetRequiredService<IConfiguration>();

    var connection = new SqlConnection(configuration.GetConnectionString("Default"));

    if (!environment.IsDevelopment())
    {
        var tokenCredential = services.GetRequiredService<ITokenCredentialFactory>();
        var token = tokenCredential.GetCredential().GetToken(AzureSqlTokenRequestContext, default);
        connection.AccessToken = token.Token;
    }

    return connection;
}
...
public interface ITokenCredentialFactory
{
    Azure.Core.TokenCredential GetCredential();
}

public class TokenCredentialFactory : ITokenCredentialFactory
{
    private IHostEnvironment Environment { get; }

    private readonly IOptionsMonitor<AzureOptions> _azureOptions;
    private AzureOptions AzureOptions => _azureOptions.CurrentValue;

    public TokenCredentialFactory(IHostEnvironment environment, IOptionsMonitor<AzureOptions> azureOptions)
    {
        Environment = environment;
        _azureOptions = azureOptions;
    }

    public Azure.Core.TokenCredential GetCredential() => GetCredential(Environment, AzureOptions.AD.TenantId);

    public static Azure.Core.TokenCredential GetCredential(IHostEnvironment environment, string tenantId) =>
        environment.IsDevelopment()
            ? new DefaultAzureCredential(new DefaultAzureCredentialOptions
            {
                VisualStudioTenantId = tenantId,
                SharedTokenCacheTenantId = tenantId
            })
            : new DefaultAzureCredential();
}

I use a very similar approach in an EntityFrameworkCore DbConnectionInterceptor throughout my other projects, so I have some confidence that the approach at least works in that scenario.

When I apply this approach to my Hangfire server as shown above, it "sort of" works. Meaning, the server can start, initialize the database by creating the tables it needs, it can schedule/run jobs, and I see plenty of logging that suggests it's reading from and updating various tables in the schema.

However, after a short period of time, the server will begin to fail to get tokens from Azure. For example, I can log the threads that succeed or fail to get a token and most worker threads (i.e. Worker #1) will continue to get a token, but eventually, a worker thread fill fail to get a token and the server will halt.

I'm not saying there's anything wrong with Hangfire, but there's obviously something not allowing Hangfire and the Azure.Identity TokenCredential system to play nice together. I am hoping that someone out there has run into (and solved) this before.

My prime suspects/guesses:

  1. GetToken(...) is async under the hood. Invoking async code from a synchronous context can lead to odd behavior
    • this can especially be the case if Hangfire is managing its own threads
  2. Whatever threading Hangfire is doing may be losing some context that prevents the Azure.Identity TokenCredential from getting my developer managed identity credentials through VisualStudio
  3. The TokenCredential system has its own internal token cache and I seem to recall seeing AsyncLocals in there. I don't know, but I wonder if that's not playing nice with Hangfire's threading.

Again, these are all just theories. I'm hoping someone has some insight into this and will rescue me from digging through the code of this project and Azure.Identity!

Thanks for taking a look.

orjan commented 2 years ago

I'm not sure if this is answer to your question? I've managed to use user managed identities with the following settings.

dotnet add package Microsoft.Data.SqlClient
            services.AddHangfire(config =>
            {
                config.UseSerilogLogProvider();
                /* We'll need to use Microsoft.Data.SqlClient that supports managed identities in Azure
                 * https://stackoverflow.com/a/68833415/191975
                 */
                config.UseSqlServerStorage(
                    () => new Microsoft.Data.SqlClient.SqlConnection(conf.HangfireConnectionString),
                    new Hangfire.SqlServer.SqlServerStorageOptions
                    {
                        // Add your custom settings here
                    }
                );
Server=tcp:my-db.database.windows.net,1433;
Initial Catalog=hangfire;
Persist Security Info=False;
MultipleActiveResultSets=False;
Encrypt=True;
TrustServerCertificate=False;
Authentication="Active Directory Managed Identity";
User Id={{ objectIdForYourManagedIdentity }};

When it comes to stability I don't have much of metrics since I rolled it out in our staging environment this Friday. That said I haven't had any logged exceptions during the weekend, but it remains to see if it's ready for production.

odinserj commented 2 years ago

@engenb your suspects sound reasonable, and you also told that you see a failure in workers after some time. Do you have an example of the exception workers throw for this case? They would be very useful to understand what's going on. For example if that's a timeout error – then the problem can relate to busy thread pool for this or that reason, and async nature of GetToken's implementation details that requires thread pool usage can cause delays.

jbennink commented 1 year ago

@orjan Thanks, I ran in to this issue using System Managed Identities. I was aware you need the Microsoft.Data.SqlClient package, but this little gem was the last piece of the puzzle to get Hangfire to work. Much appreciated.

thdotnet commented 1 year ago

@jbennink are you using .NET 6? I've tried following this thread but it's not working for me. I've tried multiple different connection strings. Can you provide more details what worked for you?

Is this how the connection string should look like? Server=test.database.windows.net; Authentication=Active Directory Managed Identity; Database=db

orjan commented 1 year ago

@thdotnet depending on if you’re using a system or user assigned managed identify you’ll need to change the connection string.

I’m using a user assigned managed identify in my example in order to decouple the identity and the database configuration from the app service. Also note that you need to change the lib for the sql client.

https://github.com/HangfireIO/Hangfire/issues/1946#issuecomment-955650927

But it boils down to what kind of identity you’re using when authenticating.

thdotnet commented 1 year ago

@orjan I am using a system assigned managed Identity.