dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.77k stars 3.19k forks source link

Query: Provide a way to indicate that part of a query should translate to server or throw #24509

Open AndreyYurashevich opened 3 years ago

AndreyYurashevich commented 3 years ago

I enjoy using EF Core 5.0 with Microsoft.EntityFrameworkCore.SqlServer on .NET 5.0, but it feels like Client evaluation in the top-level projection does not really fit into my workflow and I would like to be able to disable (either globally or on a per-query basis) falling back to client evaluation completely. By disabling I really mean any way to track it: throwing exceptions, issuing warnings, etc. I couldn't find any references to this possibility so any information would help.

nehio commented 3 years ago

Short answer no, just juse an AsEnumerable before your projection since you are done querying the DB anyway and the rest of the code is on the client side.

ajcvickers commented 3 years ago

@AndreyYurashevich Can you explain a bit more how having client evaluation in the final projection is causing you problems?

AndreyYurashevich commented 3 years ago

@ajcvickers let's consider an entity:

class User
{
    public int UserId { get; set; }
    public string FirstName { get; set; }
    public string MiddleName { get; set; }
    public string LastName { get; set; }

    public string Bio { get; set; } // Huge blob of text
}

In reality, Bio can be a number of columns, it doesn't really matter. Suppose I want a light-weight projection that doesn't touch Bio property:

dbContext.Users
    .Select(u => new UserName
    {
        Id = u.Id,
        Name = u.FirstName + " " + u.MiddleName + " " + u.LastName
    }
    .ToList())

As a result, I get a pretty much light query that potentially may benefit from the DB index that includes *Name properties but not Bio. Then I introduce new projection:

dbContext.Users
    .Select(u => new UserSummary
    {
        Id = u.Id,
        Name = u.FirstName + " " + u.MiddleName + " " + u.LastName,
        HasBio = u.Bio != null
    }
    .ToList())

So far so good, but some team member notices a way to refactor by extracting a method:

dbContext.Users
    .Select(u => new UserName
    {
        Id = u.Id,
        Name = UserNameFormatter.FormatName(u)
    }
    .ToList())

dbContext.Users
    .Select(u => new UserSummary
    {
        Id = u.Id,
        Name = UserNameFormatter.FormatName(u),
        HasBio = u.Bio != null
    }
    .ToList())

No visible behavior changes and no test can spot the difference, but after the refactoring, both queries always load and materialize the whole User object which hurts performance a lot.

To make things worse the query doesn't look that simple in my real-world scenario as I'm projecting with AutoMapper. So it's easy to miss the change during code review.

Ideally, there is some switch that I can turn on, and instead of falling back to any client evaluation, I get an exception when this happens.

ajcvickers commented 3 years ago

We discussed this in triage, and can see that it may have value. Backlogging for now to gather additional feedback.

lucas-foreflight commented 2 years ago

Second this. We perform a lot of heavy queries and we have a big development team. Sometimes its difficult to tell without logging the queries in debug mode, whether the query will execute serverside or clientside. We would like the opportunity to disable clientside evaluation by throwing, which would let us catch performance issues during development and testing without having to sift through debug logs.

irunjic commented 2 years ago

Shouldn't this be the default behavior, a not just an opt-in? EF team already made the decision to align client/server evaluation logic with user expectation by requiring ToList() and similar for everything except Select. Why make that distinction?

Personally, I've seen a lot of cases where team mates accidentally added client evaluation to a server side query without even realizing it. There are numerous examples of people being confused in this repo as well, for example, when they don't realize they are mixing client and server evaluated Queryables with Concat/Union https://github.com/dotnet/efcore/issues/16243

@ajcvickers are there any downsides to this proposal that were discussed in the triage? Is there any specific feedback EF team is looking for, other than experiencing this issue?

I think most people would be satisfied with an opt-in, although opt-out seems like a better default to me. This breaking change can be managed in the same manner as the one you did for client queries in Net Core 3.

janseris commented 1 year ago

I suggest something like DbContextOptions.AllowClientSideProjections() which would enable this behavior to run without a runtime exception or without some kind of warning displayed in the IDE (so the default would be "client-side projections not permitted")

roji commented 1 year ago

@janseris changing the default to not allow client evaluation would be a massive breaking change for a huge number of existing applications. There's very little chance we'd do that.

DrMueller commented 1 year ago

@roji It's a massive pain in .NET6, as a developer has no indication if something is evaluated locally except by manually checking the generated SQL. Unfortunately, there is also no Roslyn Code Analyzer to help, so we're stuck in being veeeeeery careful in PR reviews. Just out of interest, what was the reasoning behind not just making it configurable:

As you've changed this behavior several times now, wouldn't it have been easier to make this configurable instead of having implicit logic around, which nobody can catch automatically now?

roji commented 1 year ago

@DrMueller as far as I can remember, there was only one change here in the history of EF Core: the switch from allowing client evaluation everywhere before EF Core 3.0 to allowing it only in the top-most projection (and in parts of the query which don't reference the database at all). In the last 4.5 years since EF Core 3.0 there haven't been any changes around this, and we don't plan to make any.

So I'm curious - can you provide more context on why it's important for you to be able to e.g. disable client evaluation altogether? Client evaluation in the top-level projection isn't problematic, in that it doesn't cause more data to be transferred from the server to the client.

Specifically, we definitely don't intend to allow client-side evaluation everywhere again, like it was before EF Core 3.0. That has various drawbacks and risks to applications, not to mention the significant complexity of maintaining that mode in parallel to the normal one.

janseris commented 1 year ago

@DrMueller as far as I can remember, there was only one change here in the history of EF Core: the switch from allowing client evaluation everywhere before EF Core 3.0 to allowing it only in the top-most projection (and in parts of the query which don't reference the database at all). In the last 4.5 years since EF Core 3.0 there haven't been any changes around this, and we don't plan to make any.

So I'm curious - can you provide more context on why it's important for you to be able to e.g. disable client evaluation altogether? Client evaluation in the top-level projection isn't problematic, in that it doesn't cause more data to be transferred from the server to the client.

Specifically, we definitely don't intend to allow client-side evaluation everywhere again, like it was before EF Core 3.0. That has various drawbacks and risks to applications, not to mention the significant complexity of maintaining that mode in parallel to the normal one.

The problem with client-side projection in other than top-level projection is that:

  1. it silently happens when you refactor the projection into a function (undocumented) (this is what @DrMueller talks about - having to check generated SQL)

Why it's important to be able to disable client-side evaluation:

roji commented 1 year ago

@janseris I'm not disagreeing that the option to fully disabling client evaluation could be useful (we've accepted this into the backlog etc.), but I'm still trying to make sure I understand what the negative impact of this is from your perspective. In other words, right - refactoring projections into a separate functions can accidentally cause client evaluation if that's done improperly, but what exactly negative consequence does that have for you?

roji commented 1 year ago

For a bit more context - I do fully understand that the above causes the entire entity to be loaded, since EF cannot translate your function; this can have a negative perf impact as you're fetching more database columns than you absolutely need to. I'm trying to understand if there's any other problem here in addition to that - this may help prioritze this higher.

irunjic commented 1 year ago

@roji A minor change that causes EF to switch to client side evaluation is hard to notice in PRs and can cause many issues. Just on top of my head (all of these things are not obvious from code):

Having client eval disabled by default (or by option) would align more with user expectation, and would prevent code with unexpected behavior from being merged.

roji commented 1 year ago

Entities start being tracked even if you have written a custom select with non entities

Can you provide a minimal code sample for this? I'm not aware of this.

Composing client evaluated and non client evaluated queries can throw errors later in code

Unless I'm missing something, disabling client evaluation would simply cause a different error to be thrown (since client evaluation is not possible). If that's the case, is that useful?

Having client eval disabled by default (or by option) would align more with user expectation, and would prevent code with unexpected behavior from being merged.

To set expectations, there's very little chance we'd disable client eval by default, because of the massive breakage this would cause for anyone relying on it. Yes, we did do a big break disabling most client eval in EF 3.0, but that's because universal client eval caused severe issues (e.g. inadvertently fetching an entire table because a Where clause got client-evaluated). I don't think there's anything even close to to that here in terms of importance.

DrMueller commented 1 year ago

@roji Regarding your statement:

So I'm curious - can you provide more context on why it's important for you to be able to e.g. disable client evaluation altogether? Client evaluation in the top-level projection isn't problematic, in that it doesn't cause more data to be transferred from the server to the client.

Probably I misunderstand the sentence, but isn't that exactly the case? If the content projection can't be translated to SQL, the logic fetches ALL fields from a table and silently applies the content projection locally. This isn't a problem for single entities, but in the current projects, there are a lot of huge tables. That's quite a bit performance hit, especially for stuff like lists, which, instead of fetching 2-3 columns, now fetch all columns, which goes over the wire to the memory.

Why is this kinda problematic? Because a developer should NEVER intend to use this mechanic anyway; if he wants to use something in memory, he could use content projection with an anonymous type and then explicitly work with the in-memory list further. Therefore, all instances of the client-side evaluation are purely (technically invisible) developer mistakes.

irunjic commented 1 year ago

Can you provide a minimal code sample for this? I'm not aware of this.

I've encountered this in code already, but you can try in a simple example:

var item = dbContext.Items.Select(i => new ItemDto().BuildDto(i)).FirstOrDefault();
var trackedEntities = dbContext.ChangeTracker.Entries().ToList(); 

trackedEntities will not be empty, which is unexpected.

janseris commented 1 year ago

To set expectations, there's very little chance we'd disable client eval by default, because of the massive breakage this would cause for anyone relying on it.

This is right and because of this, I am changing my "feature request" or a proposed solution to something like:

DbContextOptions.ThrowOnClientSideProjections()

which would cause a runtime exception or some kind of warning displayed in the IDE (maybe analyzer) about "accidental client-side projections not permitted"

roji commented 1 year ago

@DrMueller

Probably I misunderstand the sentence, but isn't that exactly the case? If the content projection can't be translated to SQL, the logic fetches ALL fields from a table and silently applies the content projection locally.

There's some truth to that; but fetching all columns of a row is generally far, far less of a problem compared to fetching all rows of a table, which is what universal client evaluation caused in pre-3.0 EF. It's very reasonable in many scenarios to pass through a function and fetch all columns, whereas it was almost never reasonable to trigger client evaluation in e.g. a Where clause in pre-3.0 EF.

Because a developer should NEVER intend to use this mechanic anyway

That's a very strong, sweeping statement that doesn't correspond to what we're seeing with many users; the convenience of just using a Select typically outweighs the overhead of fetching a few extra columns.

@irunjic

Entities start being tracked even if you have written a custom select with non entities

Can you please open a separate issue with this with a minimal, runnable code sample? I'm not sure we're supposed to be doing that. /cc @ajcvickers

AndreyYurashevich commented 1 year ago

Apart from performance other problem may occur. Compare:

context.Blobs.Select(b => new BlogDto { Name = b.Name, OwnerEmail = b.Owner.Email }).ToList();

vs

context.Blobs.Select(b => ToDto(b)).ToList();
BlogDto ToDto(Blog blog) =>  new BlogDto { Name = b.Name, OwnerEmail = b.Owner.Email };

In addition from loading whole Blog entity nothing will let EF Core know to load Owner so we get NRE in runtime or even worse we can get incorrect results if related data is optional.

janseris commented 1 year ago

Apart from performance other problem may occur. Compare:

context.Blobs.Select(b => new BlogDto { Name = b.Name, OwnerEmail = b.Owner.Email }).ToList();

vs

context.Blobs.Select(b => ToDto(b)).ToList();
BlogDto ToDto(Blog blog) =>  new BlogDto { Name = b.Name, OwnerEmail = b.Owner.Email };

In addition from loading whole Blog entity nothing will let EF Core know to load Owner so we get NRE in runtime or even worse we can get incorrect results if related data is optional.

Without dbContext.Blobs.Include(b => b.Owner), accessing .Owner will throw NullReferenceException in both cases, won't it?

stevendarby commented 1 year ago

@roji @irunjic

Tracking entities in this scenario is explicitly documented as a feature here: https://learn.microsoft.com/en-us/ef/core/querying/tracking#tracking-and-custom-projections

"EF Core supports doing client evaluation in the top-level projection. If EF Core materializes an entity instance for client evaluation, it will be tracked."

irunjic commented 1 year ago

@stevendarby @roji you can see in this example that an entity is being tracked even if the entity itself is not projected. https://dotnetfiddle.net/IXGhrg

I don't know if this is the correct behavior according to EF docs, but I am certain that the user doesn't wish to have it tracked in this scenario. @roji is that correct, can I open an issue for this?

roji commented 1 year ago

Tracking entities in this scenario is explicitly documented as a feature here: learn.microsoft.com/en-us/ef/core/querying/tracking#tracking-and-custom-projections

Thanks @stevendarby - you learn something new every day!

It's not clear to me that this behavior is desirable, but it's also not clear to me that it isn't... As it's the current behavior and documented, we'd have to have a pretty good reason to change it and break people...

irunjic commented 1 year ago

I don't wish to change it either. But this is another point for having an option to disable client eval. Without it users can accidentally fetch more data and track more entities than they expect.

roji commented 1 year ago

Sure, though I'm not sure that's a very strong argument - whether users typically expect such an instance to be tracked or not isn't clear (I admit I didn't expect that, but that doesn't mean much). In other words, whether your function happens to be client-evaluated or not seems orthogonal to whether the entity should end up being tracked in this scenario.

irunjic commented 1 year ago

whether users typically expect such an instance to be tracked or not isn't clear

I disagree. Users might expect entities to be tracked if the object is part of the projected output in any field (example from the doc). Users definitely don't expect tracking when EF decides to switch to client eval and execute Select on the client and start tracking the entity even if it's not part of the select return value in any field (example from my fiddle link). That issue doesn't seem orthogonal to me.

roji commented 1 year ago

That's fine, but it doesn't seem at all clear to me. The entity type does end up getting projected out by EF - I can certainly imagine a user complaining about why it isn't tracked; the fact that it happens to pass via client evaluation may not be relevant. It's one of those things where both ways can be legitimate.

irunjic commented 1 year ago

I think I need to clarify:

context.Blogs.Select(b => mapToDto(b)).FirstOrDefault();

If mapToDto is server evaluated then nothing is tracked. If mapToDto is client evaluated then EF is tracking unnecessarily even if the result of Select is a DTO.

Just changing the fact that mapToDto is server or client translatable changes the way EF does tracking and user does not expect this. How is that not relevant to client evaluation?

roji commented 1 year ago

What exactly is mapToDto, can you show some code?

irunjic commented 1 year ago

Similar to the example I posted above https://dotnetfiddle.net/yHhWP9.

roji commented 1 year ago

I'm not seeing a server-evaluated mapToDto there. The server-evaluated projection projects to BlogId, which obviously means there's no tracked entity:

context.Blogs.Select(b => b.BlogId).FirstOrDefault();

Whereas your client-evaled MapToDto receives the entire entity, which is very different:

context.Blogs.Select(b => Extensions.MapToDto(b)).FirstOrDefault();
janseris commented 1 year ago

I'm not seeing a server-evaluated mapToDto there. The server-evaluated projection projects to BlogId, which obviously means there's no tracked entity:

context.Blogs.Select(b => b.BlogId).FirstOrDefault();

Whereas your client-evaled MapToDto receives the entire entity, which is very different:

context.Blogs.Select(b => Extensions.MapToDto(b)).FirstOrDefault();

The point probably is that MapToDto is supposed to generate SQL which selects only required columns (which is reffered to as "server-side evaluation", probably named to be an opposite to "client-side evaluation") and thus the entity shouldn't be tracked.

Posting code for the case that the issue isn't closed later for missing info if the link dies:

public static BlogDto MapToDto(Blog blog)
{
    return new BlogDto() { BlogId = blog.BlogId };
}
irunjic commented 1 year ago

I think @roji you are missing the main point. It's possible to create a server evaluated MapToDto. For example using https://github.com/ClaveConsulting/Expressionify. I just didn't bother for this code example, because the main point is that we need a way to throw an error on client evaluation, because, in practice, when people accidently introduces client evaluation they are also fetching and tracking more data than they expect.

newclaus commented 1 year ago

God, stop mocking the developers. You are trying to prove to the whole world that your bug is a feature. It's a bug. If I want to do client-side logic, I'll call AsEnumerable() or ToArray() or any other asynchronous materialization method explicitly.

var blogs = context.Blogs
    .AsEnumerable()
    .Select(b => Extensions.MapToDto(b))
    .ToArray();

And it will work exactly as I expect, as I programmed it. Perhaps this is not the best example, perhaps the bug is not reproduced here, but the point is clear. But when I compose an IQueryable<T>, I extend it with a new Expression each time, and expect my entire query to be translated to SQL and executed in SQL Server. Instead, a part of the query is being executed in the database and another part of the query in memory in the most unexpected way. Just like magic.

If you want to leave this feature - it's your wish. But, please, give us the opportunity to turn it off after all.

stevendarby commented 1 year ago

This may be taken as a given but from a design point of view I would just like to suggest that this works similarly to tracking and query splitting options, in that you can configure a global behaviour and also use extension methods to opt both in and out at the query level.

Disabling client-side evaluation globally could be a useful tool to find potentially problematic queries; you may then want to re-enable it for individual queries where it doesn't actually matter or there isn't an immediate solution.

roji commented 1 year ago

Everyone, I think the points here are clear and that further discussion isn't necessary. I think there's agreement that disabling client evaluation can be useful, as an opt in.

You are trying to prove to the whole world that your bug is a feature. It's a bug.

For some context here (we had a discussion on this in the team)... LINQ to SQL had some forms of client evaluation in the top projection. The old non-core EF did not, and the team got lots of complaints from users that this causes needless pain, and requested client evaluation in the top projection would be allowed, just like in LINQ to SQL. This was a main reason why it's allowed EF Core.

Once again, we agree that disabling client evaluation entirely can make sense for some people/situations - but keep in mind that for many others, client evaluation is useful behavior that makes writing queries easier, and with very little actual, non-theoretical drawbacks.

Beyond that, implementing this is a matter for priorization, and this issue currently has only 20 votes, so it will probably take us a while to get around to it.

janseris commented 1 year ago

Everyone, I think the points here are clear and that further discussion isn't necessary. I think there's agreement that disabling client evaluation can be useful, as an opt in.

Beyond that, implementing this is a matter for priorization, and this issue currently has only 20 votes, so it will probably take us a while to get around to it.

Thanks.

for many others, client evaluation is useful behavior that makes writing queries easier

Could you give an example? I can't see how EF silently selecting unwanted (all) columns and then throwing them away (both done in background) is useful for any developer.

irunjic commented 1 year ago

for many others, client evaluation is useful behavior that makes writing queries easier

I also don't agree it's useful behavior. In my opinion, people who are aware that EF is doing client side evaluation will want to be explicit and add ToList/AsEnumerable. On the other hand, people who don't realize this are the same ones that should be warned by EF.

roji commented 1 year ago

@irunjic it's very simple; in most scenarios, tables don't have many columns, and those columns aren't huge. In that typical scenario, fetching a few extra column makes very little difference in terms of perf. On the other hand, the fact that users don't have to constantly think about putting AsEnumerable before invoking a client function is helpful, especially for new users - we got ample signal from our userbase about that.

In any case, it's really not worth debating further, since we're already agreed that an opt-out is a good idea, and that we're not going to change the default behavior because of the breaking change.

stevendarby commented 1 year ago

Sorry, I understand that you'd like to move on, and I'd like to turn this around to a design discussion in a sec, but I wanted to give an example of why I agree that client evaluation really is useful behaviour at times. Disabling it completely could actually lead to the very scenario some want to avoid. This isn't arguing against the feature requested in this issue, which I would like to see too, I just think it's good to understand this from another angle.

A lot of discussion seems to conflate client-eval with loading the full entity. This isn't always the case.

// Currently we can do this. Although client evaluation is involved for the Select, EF Core is smart enough
// to only select the 3 columns needed for the client evaluation. The full entities are not selected.
var withClientEval = context.Blogs
    .Where(x => x.Id < 5)
    .Select(x => new BlogDto { MysteriousValue = context.DoMysteriousThing(x.Url, x.Title, x.Author) })
    .ToList();

// If client evaluation was completely disabled, a developer might stick in an AsEnumerable like so, but this
// _would_ select the full entities - perhaps including a very large column. It would start tracking the entities etc.
var naiveWithoutClientEval = context.Blogs
    .Where(x => x.Id < 5)
    .AsEnumerable()
    .Select(x => new BlogDto { MysteriousValue = context.DoMysteriousThing(x.Url, x.Title, x.Author) })
    .ToList();

// To generate the same efficient query but without client-eval, a developer would need to select an intermediary
// type with only the columns they need first. This is clunky and would get worse if more columns are needed
var betterWithoutClientEval = context.Blogs
    .Where(x => x.Id < 5)
    .Select(x => new
    {
        x.Url,
        x.Title,
        x.Author
    })
    .AsEnumerable()
    .Select(x => new BlogDto { MysteriousValue = context.DoMysteriousThing(x.Url, x.Title, x.Author) })
    .ToList();

Moving on! 😀

Does this suggestion sound reasonable? https://github.com/dotnet/efcore/issues/24509#issuecomment-1518552069 No promises but I may be interested in looking into this and so any comments on design would be appreciated.

irunjic commented 1 year ago

@stevendarby @roji how about instead of disabling client evaluation, we add an option to disable client evaluation with entity capture (or whatever it's called when you pass the whole entity and EF has to fetch and track everything). That's the scenario I think everyone is having issues with in this thread. That would satisfy both camps?

stevendarby commented 1 year ago

@irunjic @roji if targeting only entity capture (the term works for me!) then perhaps the simpler design is to raise a warning by default if this can be detected. This can then be configured to be ignored or to throw instead using the existing ConfigureWarnings pattern.

Although it will raise a new warning for existing code, this isn't breaking, and steering people to use more targeted "column capture" (if not completely eliminating client-eval) seems like a good thing to do for everyone anyway.

irunjic commented 1 year ago

Yeah something like that works also. Having a warning (by default) for client evaluation with entity capture is good idea to resolve this whole problem. It's non breaking, it informs everybody of probable misuse, and it doesn't affect people that use client eval where EF can correctly optimize.

roji commented 1 year ago

@stevendarby thanks for making the point about entity capture vs. other forms of client evaluation - I agree that's important (and we briefly brought this up in some internal discussions).

I'll bring this up with the team - a warning seems reasonable to me. Implementing this, though, may not be trivial.

janseris commented 1 year ago

For anyone who is looking for a workaround for this bug (the part with refactoring mapping code into a separate method), you can use DelegateDecompiler NuGet package and register it as EF Core query interceptor in startup. It will inline the method body back into the query (simplified). https://stackoverflow.com/a/62138200

What DelegateDecompiler fails to do (as far as I have tried):

DrMueller commented 1 year ago

Because a developer should NEVER intend to use this mechanic anyway

That's a very strong, sweeping statement that doesn't correspond to what we're seeing with many users; the convenience of just using a Select typically outweighs the overhead of fetching a few extra columns.

I meant in the context I'm working, a developer should never use the mechanic; not speaking on behalf of all developers of the world 😃 . When giving out guidelines, my main goal is to make the code as explicit as possible, making the code cleaner and better understandable for other developers and for analyzers. This feature is one of the implicit ones, which doesn't fit this bill of being explicitly clear, about what a developer tries to accomplish. Therefore, for us, it's just a hassle, but I think the topic was discussed enough, thank you for the feedbacks.

AndreyYurashevich commented 1 year ago

In addition from loading whole Blog entity nothing will let EF Core know to load Owner so we get NRE in runtime or even worse we can get incorrect results if related data is optional.

Without dbContext.Blobs.Include(b => b.Owner), accessing .Owner will throw NullReferenceException in both cases, won't it?

No, when projecting with servers-side evaluation explicit Include is not needed, but it is needed when switching to the client evaluation. That was actually the main my concern initially as reduced performance is not that bad as code that starts throwing cryptic NREs.

roji commented 1 year ago

@stevendarby and others, we discussed this and we think it's reasonable to have a warning when the top-most client evaluation causes entity capture.

janseris commented 1 year ago

Hi, will this make it into EF Core 8? If not, then maybe some estimated arrival of this bugfix? Is there a consideration for a compiler warning or IDE underline for when the opt-in "disable accidental client-side projections" call is not used?