OData / WebApi

OData Web API: A server library built upon ODataLib and WebApi
https://docs.microsoft.com/odata
Other
855 stars 473 forks source link

IHttpContextAccessor.HttpContext.User is not always authenticated (when it should) when invoked by IModelCacheKeyFactory #2833

Closed NinjaCross closed 9 months ago

NinjaCross commented 9 months ago

The problem I'm facing seems to be very similar to https://github.com/OData/WebApi/issues/2294, but happens in generic API (non-OData) controllers that uses HttpContext-aware DbContext instances.

The problem is very odd, because:

.... with exactly same DB, same source code, same .NET version, same SDK.

The only factor that changes among the test environments I'm using, is the hardware. Some machines are radically slower than others, and it seems that the problems happen much more frequently on slower ones. This raises the suspicion that there is some kind of race-condition scenario happening under the hood in the initialization of HttpContext.User

AFAIK, I'm following all the known best practices related to the usage of DbContext in those scenarios that needs a "contextualization" of the queries for the current ClaimsPrincipal. The DbContext needs to know the "current" user, so that the Global Filters (injected into it) can use the user-id to contextualize the executed DB queries. In order to do so, I'm:

Code-wise, my scenario is (very similar to) this:

  1. A scoped EF DbContext, with a IHttpContextAccessor injected into the constructor. The following sample shows a direct injection for simplicity, but the same problem arises even when "wrapping" the IHttpContextAccessor into another "current user resolver" service (scoped, transient or singleton... doesn't matter)
public abstract partial class MyDbContext: DbContext
{
    private readonly IHttpContextAccessor _httpContextAccessor;

    // it doesn't matter if I directly inject IHttpContextAccessor like this, 
    // or I wrap it into another service, the result is the same
    public MyDbContext([NotNull] IHttpContextAccessor httpContextAccessor)
    {
        _httpContextAccessor = httpContextAccessor ??
            throw new ArgumentNullException(nameof(httpContextAccessor));
    }

    private long ? _userId;

    public long UserId
    {
        get
        {
            var uid = _userId;
            if (uid == null)
            {
                // lazy initialize the "current user" by reading the claims of the HttpContext.User
                var claims = _httpContextAccessor.HttpContext.User?.Claims;
                var useriIdClaim = claims?.FirstOrDefault(c => c.Type == ClaimTypes.Sid);
                if (useriIdClaim != null && long.TryParse(useriIdClaim.Value, out  var userId) && userId > 0)
                {
                    _userId = userId; // user authenticated, return the User ID
                    uid = userId;
                }
                else
                {
                    uid = 0; // user is NOT authenticated, return ZERO
                    _userId = uid;
                }
            }
            return (long) uid;
        }
    }
}
  1. The EF DbContext is injected and initialized as per standard best practices (to my knowledge), and its model is discriminated by OneModelPerUserModelCacheKeyFactory, an implementor of `IModelCacheKeyFactory'
services.AddDbContext<MyDbContext>(
         (serviceProvider, options) =>
         {
               var connString = DbContextUtils.GetApplicationDbContextConnectionString(configuration);
               options.UseSqlServer(connString, x => x.MigrationsAssembly("My.Migrations"));
               options.ReplaceService<IModelCacheKeyFactory, OneModelPerUserModelCacheKeyFactory>();
         });

I suspect the problem is here. It seems that the IModelCacheKeyFactory is invoked in a moment when HttpContext.User is not "ready" yet.

public class OneModelPerUserModelCacheKeyFactory : IModelCacheKeyFactory
{
    public object Create(DbContext context) => Create(context, false);

    public object Create(DbContext context, bool designTime)
    {
        if (context == null) throw new ArgumentNullException(nameof(context));

        if (context is MyDbContext hasUserId)
        {
           // Discriminate the datamodel by User Id
           // The problems seems to be here.
          // In this specific point, the HttpContext.User seems to be not always "ready" for usage, and so
          // the MyDbContext.UserId is ZERO instead of the expected value.
          // Due to this, the cache doesn't correctly discriminate which model to use, and return the one
          // used for non-authenticated users (the one with UserId=0)
            return (context.GetType(), hasUserId.UserId, designTime); 
        }

        return (object)context.GetType();
    }
}
  1. The EF DbContext is injected using [FromServices] into the action method of an API Controller

    [Route("api/My")]
    [ApiController]
    [Authorize]
    public class MyController : ControllerBase
    {  
    [HttpGet]
    [Route("MyTestAction")]    
    public async Task<ActionResult> MyTestAction([FromServices] MyDbContext dbContext)
    {
          // This is NOT NULL, when a user is correctly authenticated.
          // It's also correctly initialized, and the User Id is correctly present into the Claims
      var user = User;
    
         // This is zero at random, because it was lazy-initialized by the IModelCacheKeyFactory
         // in a moment when (I suspect) HttpContext.User was not "ready" (as illustrated into the previous code chunk)
      var userid = dbContext.UserId;
    }   
    }

Assemblies affected

It's unclear to me were the problem could be, but here below are the most meaningfull versions of the components I'm using

   [net6.0]: 
   Top-level Package                                            Requested        Resolved      
   > Microsoft.AspNetCore.Authentication.JwtBearer              6.*              6.0.26        
   > Microsoft.AspNetCore.Identity                              2.2.0            2.2.0         
   > Microsoft.AspNetCore.Mvc.NewtonsoftJson                    6.*              6.0.26        
   > Microsoft.AspNetCore.OData                                 7.5.*            7.5.18        
   > Microsoft.AspNetCore.SignalR.Protocols.NewtonsoftJson      6.*              6.0.26        
   > Microsoft.IdentityModel.Abstractions                       6.35.*           6.35.0        
   > Microsoft.IdentityModel.JsonWebTokens                      6.35.*           6.35.0        
   > Microsoft.IdentityModel.Logging                            6.35.*           6.35.0        
   > Microsoft.IdentityModel.Protocols                          6.35.*           6.35.0        
   > Microsoft.IdentityModel.Protocols.OpenIdConnect            6.35.*           6.35.0        
   > Microsoft.IdentityModel.Tokens                             6.35.*           6.35.0        

   Transitive Package                                                                   Resolved               
   > Microsoft.AspNetCore.Authentication                                                2.2.0                  
   > Microsoft.AspNetCore.Authentication.Abstractions                                   2.2.0                  
   > Microsoft.AspNetCore.Authentication.Cookies                                        2.2.0                  
   > Microsoft.AspNetCore.Authentication.Core                                           2.2.0                  
   > Microsoft.AspNetCore.Authorization                                                 2.2.0                  
   > Microsoft.AspNetCore.Authorization.Policy                                          2.1.1                  
   > Microsoft.AspNetCore.Connections.Abstractions                                      6.0.26                 
   > Microsoft.AspNetCore.Cryptography.Internal                                         2.2.0                  
   > Microsoft.AspNetCore.Cryptography.KeyDerivation                                    2.2.0                  
   > Microsoft.AspNetCore.DataProtection                                                2.2.0                  
   > Microsoft.AspNetCore.DataProtection.Abstractions                                   2.2.0                  
   > Microsoft.AspNetCore.Hosting.Abstractions                                          2.2.0                  
   > Microsoft.AspNetCore.Hosting.Server.Abstractions                                   2.2.0                  
   > Microsoft.AspNetCore.Http                                                          2.2.0                  
   > Microsoft.AspNetCore.Http.Abstractions                                             2.2.0                  
   > Microsoft.AspNetCore.Http.Connections                                              1.0.15                 
   > Microsoft.AspNetCore.Http.Connections.Client                                       6.0.26                 
   > Microsoft.AspNetCore.Http.Connections.Common                                       6.0.26                 
   > Microsoft.AspNetCore.Http.Extensions                                               2.2.0                  
   > Microsoft.AspNetCore.Http.Features                                                 5.0.17                 
   > Microsoft.AspNetCore.HttpsPolicy                                                   2.2.0                  
   > Microsoft.AspNetCore.JsonPatch                                                     6.0.26                 
   > Microsoft.AspNetCore.Routing                                                       2.1.1                  
   > Microsoft.AspNetCore.Routing.Abstractions                                          2.1.1                  
   > Microsoft.AspNetCore.SignalR.Client                                                6.0.26                 
   > Microsoft.AspNetCore.SignalR.Client.Core                                           6.0.26                 
   > Microsoft.AspNetCore.SignalR.Common                                                6.0.26                 
   > Microsoft.AspNetCore.SignalR.Core                                                  1.1.0                  
   > Microsoft.AspNetCore.SignalR.Protocols.Json                                        6.0.26                 
   > Microsoft.AspNetCore.WebSockets                                                    2.1.7                  
   > Microsoft.AspNetCore.WebUtilities                                                  2.2.0                  
   > Microsoft.EntityFrameworkCore                                                      7.0.15                 
   > Microsoft.EntityFrameworkCore.Abstractions                                         7.0.15                 
   > Microsoft.EntityFrameworkCore.Analyzers                                            7.0.15                 
   > Microsoft.EntityFrameworkCore.Design                                               6.0.7                  
   > Microsoft.EntityFrameworkCore.Relational                                           7.0.15                 
   > Microsoft.EntityFrameworkCore.Relational.Design                                    2.0.0-preview1-final   
   > Microsoft.EntityFrameworkCore.Sqlite                                               6.0.7                  
   > Microsoft.EntityFrameworkCore.Sqlite.Core                                          6.0.7                  
   > Microsoft.EntityFrameworkCore.Sqlite.Design                                        2.0.0-preview1-final   
   > Microsoft.EntityFrameworkCore.SqlServer                                            7.0.15  
   > Microsoft.Identity.Client                                                          4.56.0                 
   > Microsoft.Identity.Client.Extensions.Msal                                          4.56.0                 

Reproduce steps

Since the problems happens at random, It's not easy to reproduce. On some machines it always happens. On other machines it never happens. The only pattern I can see, is that it happens more frequently in slower machines, and almost never happen in faster machines.

Expected result

HttpContext.User should be ready for usage from any injected service after the authentication process has been executed.

Actual result

HttpContext.User seems to be "not always ready" when it's injected in other services.

robertmclaws commented 9 months ago

We've seen this happen in Restier (a downstream library to OData) and determined that it happens because OData disposes of the HttpContext at random times, especially for batch endpoints. In the thread you mentioned, we tracked down the core problem to the HttpContextFactory disposing of the HttpContext too early in the overall cycle. At some point OData disposes of the Factory, which makes it unavailable to later processing.

We solved this by taking the original suggested solution on StackOverflow and making it Middleware instead.

https://github.com/OData/RESTier/blob/main/src/Microsoft.Restier.AspNetCore/Middleware/ODataBatchHttpContextFixerMiddleware.cs

We also optionally have a ClaimsPrincipal middleware that does the same thing, but then sets the ClaimsPrincipalSelector to IHttpContextAccessor.HttpContext.User. We make this available because some business logic needs to be able to operate outside of the context of HTTP, so taking HTTP dependencies for those libraries is bad design.

You can check it out here: https://github.com/OData/RESTier/blob/main/src/Microsoft.Restier.AspNetCore/Middleware/RestierClaimsPrincipalMiddleware.cs

We use Middleware instead of other techniques because Middleware uses the HttpContext directly. We can use Method Injection to inject the IHttpContextAccessor and fix the IHttpContextAccessor.HttpContext property and fix it if it's been prematurely disposed.

On its face, it would seem that one of these two techniques might mitigate your problem.

HOWEVER...

In looking at how you have structured everything, I would say that the entire implementation is flawed, and the fact that it ever worked but only on slow machines is the actual symptom of the REAL and very different problem.

There is one key detail I am missing before definitively pointing to the problem. When you say this is a "multi-tenant" solution, are all the tenants in the same database, or are you connecting to a different distinct database instance per tenant?

NinjaCross commented 9 months ago

@robertmclaws thanks for your contribution.

We've seen this happen in Restier (a downstream library to OData) and determined that it happens because OData disposes of the HttpContext at random times, especially for batch endpoints. In the thread you mentioned, we tracked down the core problem to the HttpContextFactory disposing of the HttpContext too early in the overall cycle. At some point OData disposes of the Factory, which makes it unavailable to later processing.

As I said I'm working with normal API Controllers, not with OData controllers. I only mentioned https://github.com/OData/WebApi/issues/2294 because there were similarities.
I'm not sure your observations are contextually applicable, but nonetheless I appreciate them, I'll give them a try. In my specific scenario, it doesn't seems to be a problem in eager or unexpected disposing of the HttpContext (not as far as I can see debugging the code), quite the contrary: it seems it's made available by the ASP.NET pipeline before it's completely initalized, but I'll investigate further.

In looking at how you have structured everything, I would say that the entire implementation is flawed,

AFAIK that's exactly how it's always been done everywhere. There are many, many, many sources on the Internet (both old and recent) suggesting this approach (injecting IHttpContextFactory wherever I need it in a HTTP-scoped execution flow). If this implementation is flawed, then thousands of other applications are flawed too around the world... and I find difficult to think that so many applications are flawed and nobody (or just a few) noticed it before :)

Anyway, I'm not discarding any option, I'll take into consideration your observations.

and the fact that it ever worked but only on slow machines is the actual symptom of the REAL and very different problem.

That's not what I meant; maybe I haven't been precise enough, sorry, English is not my first language. We have many thousands integration tests covering all APIs and various services of the application, verifying all kind of scenarios and conditions. There hasn't been any problems of any sort for years on both fast and slow machines. Around the middle of December 2023, we've been signaled about this strange and pseudo-randomic behaviour. Only recently, after many other tests, we discovered a correlation between the occurrence of the problem and slow machines.

I'm not sure if it's meaningfull or not, but we upgraded from 6.0.25 to 6.0.26 in the same period: maybe I'll try to revert to 6.0.25, even if frankly it seems a bit far-fetched.

We use Middleware instead of other techniques because Middleware uses the HttpContext directly. We can use Method Injection to inject the IHttpContextAccessor and fix the IHttpContextAccessor.HttpContext property and fix it if it's been prematurely disposed.

  1. I didn't find other sources about solving the problem using middlewares, but it seems an interesting approach, I'll give it a try, l thanks

There is one key detail I am missing before definitively pointing to the problem. When you say this is a "multi-tenant" solution, are all the tenants in the same database, or are you connecting to a different distinct database instance per tenant?

Sorry, I was a bit misleading with that sentence, my bad. I changed the description to avoid further confusion. What I meant is that every DbContext instance is "contextualized" (using a combination of Global Query Filters and IModelCacheKeyFactory) to the logged-in user performing the HTTP-request into which the DbContext itself is "scoped". In this context the "tenant" is in reality the user itself. Please ignore the "multi-tenant" definition I used, it's not appropriate.

robertmclaws commented 9 months ago

Ok, thanks for the context of the situation. Then I was correct in my assessment, you are creating complexity that is not actually necessary in the application, regardless of how many people say that this is how you do it.

In ASP.NET Core, the following three things are SCOPED by default, meaning there is a single instance per request, and the same instance is injected throughout the request lifecycle:

That means that the ModelCacheKeyFactory is completely unnecessary, and in this case is causing a problem by attempting to access the UserId too early in the request execution.

Those types of situations are for when you are in a TRUE multi-tenant environment, and each tenant may have a different EDM model or connection string. That is not the case here.

One thing you need to consider is that, no matter how many THOUSANDS of tests you have verifying this situation, an incorrect pattern may not manifest its problems until much later. I believe what happened here is that Microsoft fixed a bug with a runtime that is almost out of support and exposed the flaw in your setup.

You can do this whole thing without tying HttpContext objects to the DbContext at all (which should never be done anyways since doing so means they can't be used in non-HTTP contexts, like unit tests).

In addition, there are so many unnecessary memory allocations and commands in the UserId processing that you're slowing everything down on every request.

Your code should look like this:


    // RWM: Put this in your services startup.
    services.AddDbContext<MyDbContext>(
         (serviceProvider, options) =>
         {
               var connString = DbContextUtils.GetApplicationDbContextConnectionString(configuration);
               options.UseSqlServer(connString, x => x.MigrationsAssembly("My.Migrations"));
         });

///<summary>
/// Dramatically simplified context that doesn't need to worry about getting polluted by 
/// a lack of separation of concerns.
///</summary>
public abstract partial class MyDbContext: DbContext
{

    ///<summary>
    /// The ID for the currently logged-in user. Is 0 for unauthenticated users, and null when the instance 
    /// is first created or the claim is not found.
    ///</summary>
    public long? UserId { get; set; }

}

public static class MyClaimsExtensions
{

    ///<summary>
    /// The safe way to get the UserId from the currently executing request.
    ///</summary>
    public static long? GetUserId(this HttpContext httpContext)
    {
        if (httpContext.User?.Identity?.IsAuthenticated != true) return 0;
        long.TryParse(httpContext.User?.Claims?.FirstOrDefault(c => c.Type == ClaimTypes.Sid)?.Value, out var userId);
        // RWM: If it's still null after you call this method then there is a problem somewhere else on the stack.
        return userId is not null ? (long)userId : null;
    }

}

[Route("api/My")]
[ApiController]
[Authorize]
public class MyController : ControllerBase
{  

    ///<summary>
    /// The <see cref="MyDbContext" /> instance for this request.
    ///</summary>
    public MyDbContext DbContext { get; private set; }

    ///<summary>
    /// The <see cref="MyDbContext" /> instance for this request.
    ///</summary>
    public MyController(MyDbContext dbContext)
    {
        DbContext = dbContext;
    }

    [HttpGet]
    [Route("MyTestAction")]    
    public async Task<ActionResult> MyTestAction()
    {
        // RWM: Follow a clean separation of concerns by bridging the User Claim to the DbContext before using it.
        dbContext.UserId = HttpContext.GetUserId();
        /// TODO: RWM: Do some other stuff.

    }   
}

This has the benefits of:

I hope this structure helps.

If you REALLY can't implement it this way for whatever reason (and you really should try to do it the way I suggested), then still get rid of the ModelCache and just call my GetUserId() extension method in the MyDbContext contructor that injects the HttpContextAccessor. But PLEASE only use that as a last resort... and it STILL may not work.

NinjaCross commented 9 months ago

@robertmclaws

That means that the ModelCacheKeyFactory is completely unnecessary, and in this case is causing a problem by attempting to access the UserId too early in the request execution.

AFAIK, that's not how EF works :) https://learn.microsoft.com/en-us/ef/core/modeling/dynamic-model In my understanding, IModelCacheKeyFactory is a singleton service, and the internal EF entity model cache is singleton too. So without a discriminator, multiple "scopes" would access the same "global" entity model (the one created in OnModelCreating, injected with the Global Query Filters I need), shared among all of them. The IModelCacheKeyFactory allows to correctly select the entity model which has been contextualized for a specific user. Without that, the DbContext would use a entity model contextualized for another user.

Your code should look like this:

That's exactly how I solved temporarily the problem a few days ago. I personally like the separation of concerns, but I also don't like that explicit assignment of UserId. The services in a "scope" should be contextualized and ready for usage without explicit intervention by the developer, otherwise human errors could occur in the process, introducing fragilities and vulnerabilities.

Anyway, I just realized that in the attempt to simplify my explanation, I trivialized a bit too much the context, omitted important aspects, and I'm somehow misleading you. I'm closing the issue at the moment, I'll open a new one with a more precise explanation of the context.

robertmclaws commented 9 months ago

What I explained is exactly how EF works, and I proved it with a link to the source code.

I understand that you think EFCore needs to work this way, but your entire set of assumptions is based off of needing Dynamic Modeling. If your team misunderstood how that works, and it turned out you DON'T need that, then every downstream assumption is also incorrect.

The IModelCacheKeyFactory allows to correctly select the entity model which has been contextualized for a specific user. Without that, the DbContext would use a entity model contextualized for another user.

That last statement is not correct. As I previously explained, Dependency Injection handles getting an instance of the DbContext per request, so the DbContext CANNOT be "contextualized" to another user by default. If you are seeing that behavior, it's because you've made a mistake somewhere else in the architecture.

If you are not dynamically changing the EDM data model itself (by adjusting types at RUNTIME) and are just using the DbContext as created by the EF migrations, then what you are doing with the IModelCacheKeyFactory is completely and totally unnecessary.

Again, the implementation of the IModelCacheKeyFactory and UserId together seem to be the problem. It's happening because authentication connects to external systems to process authentication cookies or JWTs, which takes time.

On slower machines it may not have completed by the time the DbContext is prematurely instantiated to set the UserId, because your implementation creates the DbContext instance much earlier in the pipeline than necessary.

The fix I described solves the problem because by the point in the process where the UserId is assigned to the DbContext, the authentication is guaranteed to be fully processed. Getting cute with the design because you "don't like" setting the UserId this way will only lead to downstream problems.

Alternatively, you could just use my ClaimsPrincipal implementation I mentioned in my first post to abstract away the UserId from the DbContext and then your interceptors can just use ClaimsPrincipal.CurrentPrincipal.GetUserId() instead, which is how all of the systems I build function. I didn't suggest it before because it would require too many code changes.

At any rate, this is not the correct place to post the issue because this is a problem with your chosen architecture and not OData.