Closed NinjaCross closed 9 months ago
We've seen this happen in Restier (a downstream library to OData) and determined that it happens because OData disposes of the HttpContext
at random times, especially for batch endpoints. In the thread you mentioned, we tracked down the core problem to the HttpContextFactory
disposing of the HttpContext
too early in the overall cycle. At some point OData disposes of the Factory, which makes it unavailable to later processing.
We solved this by taking the original suggested solution on StackOverflow and making it Middleware instead.
We also optionally have a ClaimsPrincipal
middleware that does the same thing, but then sets the ClaimsPrincipalSelector
to IHttpContextAccessor.HttpContext.User
. We make this available because some business logic needs to be able to operate outside of the context of HTTP, so taking HTTP dependencies for those libraries is bad design.
You can check it out here: https://github.com/OData/RESTier/blob/main/src/Microsoft.Restier.AspNetCore/Middleware/RestierClaimsPrincipalMiddleware.cs
We use Middleware instead of other techniques because Middleware uses the HttpContext
directly. We can use Method Injection to inject the IHttpContextAccessor
and fix the IHttpContextAccessor.HttpContext
property and fix it if it's been prematurely disposed.
On its face, it would seem that one of these two techniques might mitigate your problem.
In looking at how you have structured everything, I would say that the entire implementation is flawed, and the fact that it ever worked but only on slow machines is the actual symptom of the REAL and very different problem.
There is one key detail I am missing before definitively pointing to the problem. When you say this is a "multi-tenant" solution, are all the tenants in the same database, or are you connecting to a different distinct database instance per tenant?
@robertmclaws thanks for your contribution.
We've seen this happen in Restier (a downstream library to OData) and determined that it happens because OData disposes of the HttpContext at random times, especially for batch endpoints. In the thread you mentioned, we tracked down the core problem to the HttpContextFactory disposing of the HttpContext too early in the overall cycle. At some point OData disposes of the Factory, which makes it unavailable to later processing.
As I said I'm working with normal API Controllers, not with OData controllers.
I only mentioned https://github.com/OData/WebApi/issues/2294 because there were similarities.
I'm not sure your observations are contextually applicable, but nonetheless I appreciate them, I'll give them a try.
In my specific scenario, it doesn't seems to be a problem in eager or unexpected disposing of the HttpContext (not as far as I can see debugging the code), quite the contrary: it seems it's made available by the ASP.NET pipeline before it's completely initalized, but I'll investigate further.
In looking at how you have structured everything, I would say that the entire implementation is flawed,
AFAIK that's exactly how it's always been done everywhere. There are many, many, many sources on the Internet (both old and recent) suggesting this approach (injecting IHttpContextFactory wherever I need it in a HTTP-scoped execution flow). If this implementation is flawed, then thousands of other applications are flawed too around the world... and I find difficult to think that so many applications are flawed and nobody (or just a few) noticed it before :)
Anyway, I'm not discarding any option, I'll take into consideration your observations.
and the fact that it ever worked but only on slow machines is the actual symptom of the REAL and very different problem.
That's not what I meant; maybe I haven't been precise enough, sorry, English is not my first language. We have many thousands integration tests covering all APIs and various services of the application, verifying all kind of scenarios and conditions. There hasn't been any problems of any sort for years on both fast and slow machines. Around the middle of December 2023, we've been signaled about this strange and pseudo-randomic behaviour. Only recently, after many other tests, we discovered a correlation between the occurrence of the problem and slow machines.
I'm not sure if it's meaningfull or not, but we upgraded from 6.0.25 to 6.0.26 in the same period: maybe I'll try to revert to 6.0.25, even if frankly it seems a bit far-fetched.
We use Middleware instead of other techniques because Middleware uses the HttpContext directly. We can use Method Injection to inject the IHttpContextAccessor and fix the IHttpContextAccessor.HttpContext property and fix it if it's been prematurely disposed.
There is one key detail I am missing before definitively pointing to the problem. When you say this is a "multi-tenant" solution, are all the tenants in the same database, or are you connecting to a different distinct database instance per tenant?
Sorry, I was a bit misleading with that sentence, my bad.
I changed the description to avoid further confusion.
What I meant is that every DbContext instance is "contextualized" (using a combination of Global Query Filters and IModelCacheKeyFactory
) to the logged-in user performing the HTTP-request into which the DbContext itself is "scoped".
In this context the "tenant" is in reality the user itself. Please ignore the "multi-tenant" definition I used, it's not appropriate.
Ok, thanks for the context of the situation. Then I was correct in my assessment, you are creating complexity that is not actually necessary in the application, regardless of how many people say that this is how you do it.
In ASP.NET Core, the following three things are SCOPED by default, meaning there is a single instance per request, and the same instance is injected throughout the request lifecycle:
.AddDbContext<YourDbContextType>()
see hereIHttpContextAccessor.HttpContext
That means that the ModelCacheKeyFactory
is completely unnecessary, and in this case is causing a problem by attempting to access the UserId too early in the request execution.
Those types of situations are for when you are in a TRUE multi-tenant environment, and each tenant may have a different EDM model or connection string. That is not the case here.
One thing you need to consider is that, no matter how many THOUSANDS of tests you have verifying this situation, an incorrect pattern may not manifest its problems until much later. I believe what happened here is that Microsoft fixed a bug with a runtime that is almost out of support and exposed the flaw in your setup.
You can do this whole thing without tying HttpContext
objects to the DbContext
at all (which should never be done anyways since doing so means they can't be used in non-HTTP contexts, like unit tests).
In addition, there are so many unnecessary memory allocations and commands in the UserId processing that you're slowing everything down on every request.
Your code should look like this:
// RWM: Put this in your services startup.
services.AddDbContext<MyDbContext>(
(serviceProvider, options) =>
{
var connString = DbContextUtils.GetApplicationDbContextConnectionString(configuration);
options.UseSqlServer(connString, x => x.MigrationsAssembly("My.Migrations"));
});
///<summary>
/// Dramatically simplified context that doesn't need to worry about getting polluted by
/// a lack of separation of concerns.
///</summary>
public abstract partial class MyDbContext: DbContext
{
///<summary>
/// The ID for the currently logged-in user. Is 0 for unauthenticated users, and null when the instance
/// is first created or the claim is not found.
///</summary>
public long? UserId { get; set; }
}
public static class MyClaimsExtensions
{
///<summary>
/// The safe way to get the UserId from the currently executing request.
///</summary>
public static long? GetUserId(this HttpContext httpContext)
{
if (httpContext.User?.Identity?.IsAuthenticated != true) return 0;
long.TryParse(httpContext.User?.Claims?.FirstOrDefault(c => c.Type == ClaimTypes.Sid)?.Value, out var userId);
// RWM: If it's still null after you call this method then there is a problem somewhere else on the stack.
return userId is not null ? (long)userId : null;
}
}
[Route("api/My")]
[ApiController]
[Authorize]
public class MyController : ControllerBase
{
///<summary>
/// The <see cref="MyDbContext" /> instance for this request.
///</summary>
public MyDbContext DbContext { get; private set; }
///<summary>
/// The <see cref="MyDbContext" /> instance for this request.
///</summary>
public MyController(MyDbContext dbContext)
{
DbContext = dbContext;
}
[HttpGet]
[Route("MyTestAction")]
public async Task<ActionResult> MyTestAction()
{
// RWM: Follow a clean separation of concerns by bridging the User Claim to the DbContext before using it.
dbContext.UserId = HttpContext.GetUserId();
/// TODO: RWM: Do some other stuff.
}
}
This has the benefits of:
I hope this structure helps.
If you REALLY can't implement it this way for whatever reason (and you really should try to do it the way I suggested), then still get rid of the ModelCache and just call my GetUserId()
extension method in the MyDbContext
contructor that injects the HttpContextAccessor
. But PLEASE only use that as a last resort... and it STILL may not work.
@robertmclaws
That means that the ModelCacheKeyFactory is completely unnecessary, and in this case is causing a problem by attempting to access the UserId too early in the request execution.
AFAIK, that's not how EF works :)
https://learn.microsoft.com/en-us/ef/core/modeling/dynamic-model
In my understanding, IModelCacheKeyFactory
is a singleton service, and the internal EF entity model cache is singleton too.
So without a discriminator, multiple "scopes" would access the same "global" entity model (the one created in OnModelCreating, injected with the Global Query Filters I need), shared among all of them.
The IModelCacheKeyFactory
allows to correctly select the entity model which has been contextualized for a specific user.
Without that, the DbContext would use a entity model contextualized for another user.
Your code should look like this:
That's exactly how I solved temporarily the problem a few days ago. I personally like the separation of concerns, but I also don't like that explicit assignment of UserId. The services in a "scope" should be contextualized and ready for usage without explicit intervention by the developer, otherwise human errors could occur in the process, introducing fragilities and vulnerabilities.
Anyway, I just realized that in the attempt to simplify my explanation, I trivialized a bit too much the context, omitted important aspects, and I'm somehow misleading you. I'm closing the issue at the moment, I'll open a new one with a more precise explanation of the context.
What I explained is exactly how EF works, and I proved it with a link to the source code.
I understand that you think EFCore needs to work this way, but your entire set of assumptions is based off of needing Dynamic Modeling. If your team misunderstood how that works, and it turned out you DON'T need that, then every downstream assumption is also incorrect.
The IModelCacheKeyFactory allows to correctly select the entity model which has been contextualized for a specific user. Without that, the DbContext would use a entity model contextualized for another user.
That last statement is not correct. As I previously explained, Dependency Injection handles getting an instance of the DbContext
per request, so the DbContext
CANNOT be "contextualized" to another user by default. If you are seeing that behavior, it's because you've made a mistake somewhere else in the architecture.
If you are not dynamically changing the EDM data model itself (by adjusting types at RUNTIME) and are just using the DbContext
as created by the EF migrations, then what you are doing with the IModelCacheKeyFactory
is completely and totally unnecessary.
Again, the implementation of the IModelCacheKeyFactory
and UserId together seem to be the problem. It's happening because authentication connects to external systems to process authentication cookies or JWTs, which takes time.
On slower machines it may not have completed by the time the DbContext
is prematurely instantiated to set the UserId, because your implementation creates the DbContext
instance much earlier in the pipeline than necessary.
The fix I described solves the problem because by the point in the process where the UserId is assigned to the DbContext
, the authentication is guaranteed to be fully processed. Getting cute with the design because you "don't like" setting the UserId this way will only lead to downstream problems.
Alternatively, you could just use my ClaimsPrincipal
implementation I mentioned in my first post to abstract away the UserId from the DbContext
and then your interceptors can just use ClaimsPrincipal.CurrentPrincipal.GetUserId()
instead, which is how all of the systems I build function. I didn't suggest it before because it would require too many code changes.
At any rate, this is not the correct place to post the issue because this is a problem with your chosen architecture and not OData.
The problem I'm facing seems to be very similar to https://github.com/OData/WebApi/issues/2294, but happens in generic API (non-OData) controllers that uses HttpContext-aware
DbContext
instances.The problem is very odd, because:
.... with exactly same DB, same source code, same .NET version, same SDK.
The only factor that changes among the test environments I'm using, is the hardware. Some machines are radically slower than others, and it seems that the problems happen much more frequently on slower ones. This raises the suspicion that there is some kind of race-condition scenario happening under the hood in the initialization of
HttpContext.User
AFAIK, I'm following all the known best practices related to the usage of
DbContext
in those scenarios that needs a "contextualization" of the queries for the current ClaimsPrincipal. TheDbContext
needs to know the "current" user, so that the Global Filters (injected into it) can use the user-id to contextualize the executed DB queries. In order to do so, I'm:IHttpContextAccessor
IModelCacheKeyFactory
[FromServices]
Code-wise, my scenario is (very similar to) this:
DbContext
, with aIHttpContextAccessor
injected into the constructor. The following sample shows a direct injection for simplicity, but the same problem arises even when "wrapping" theIHttpContextAccessor
into another "current user resolver" service (scoped, transient or singleton... doesn't matter)OneModelPerUserModelCacheKeyFactory
, an implementor of `IModelCacheKeyFactory'I suspect the problem is here. It seems that the
IModelCacheKeyFactory
is invoked in a moment whenHttpContext.User
is not "ready" yet.The EF DbContext is injected using
[FromServices]
into the action method of an API ControllerAssemblies affected
It's unclear to me were the problem could be, but here below are the most meaningfull versions of the components I'm using
Reproduce steps
Since the problems happens at random, It's not easy to reproduce. On some machines it always happens. On other machines it never happens. The only pattern I can see, is that it happens more frequently in slower machines, and almost never happen in faster machines.
Expected result
HttpContext.User
should be ready for usage from any injected service after the authentication process has been executed.Actual result
HttpContext.User
seems to be "not always ready" when it's injected in other services.