Closed 0xdeafcafe closed 4 years ago
Historically this was not supported by EF, However this would be a nice feature to have. We had to implement something similar ourselves in the our code base using Linq tree rewriting. It hope the EF team considers this.
Yeah. I noticed discussion of this in the .Include()
design meeting docs. It looked as if it was road-mapped and going to be implemented. Although that could all just be a dreadful fantasy and I'm completely wrong.
Moving to back log as a new feature (we do want to support filtered include but it is a lower priority than some other features that were available in EF6).
Opened new issue for better exception message https://github.com/aspnet/EntityFramework/issues/1883
Don't underestimate the value of filtered include. It was a top request in ef 6. I came across this requirement on my first ef 7 project. I'd take that over lazy loading any day.
Any non-trivial development using the Entity Framework will encounter this limitation.
It also has consequences on other technologies likes the OData protocol, because there is really no point to adding filtered $expands to the protocol if EF does not support them.
For those who are not (yet) familiar with the internals of EF Core, could you point us to the parts of the Entity Framework that would be impacted in order support this ?
I too would love this feature.
@rowanmiller so what is the current way of using left join with a where clause \ on clause? I currently query for the entire relationship and filter on the client. Is there a better way of doing it?
var story = await _context.Stories
.Include(x => x.Paragraphs)
.Include(x => x.User)
.Include(x => x.Tips)
.Include(x => x.Images)
.Include(x => x.Comments)
.ThenInclude(x => x.User)
.SingleOrDefaultAsync(x => x.Id == storyId);
if (story == null)
return null;
story.Comments = story.Comments.Where(x => !x.IsDeleted).ToList();
story.Images = story.Images.Where(x => !x.IsDeleted).ToList();
story.Paragraphs = story.Paragraphs.Where(x => !x.IsDeleted).ToList();
story.Tips = story.Tips.Where(x => !x.IsDeleted).ToList();
return story;
It's a mess... and bad performance-wise.
Will Filtered Loading be supported at configuration level, in order to have one to many navigation property filtered upstream? Can this feature be someway useful to avoid bad configurations with circular reference like this?
@lucacestola I'm not sure why you think it's related to conditional querying of a navigation property.
@gdoron because it could be also applyed to one to many relationships and would be implicit.
Actually, with EF 6, I was not yet able to find a good solution for the relation type like the one at the posted link, without using a circular FK and, at the meantime, have the possibility to use an expression like this: parent.MainChild.ChildProperty
.
I know that this kind of relations depends on a very custom logic so there is no simple way to address such a need, and I was hoping that, configuring the way relationships are loaded could almost partially address the issue.
I just have the exactly same need as @gdoron (and not surprising, with IsDeleted
fields as well)
is there any way to left join w/ where at db level?
I'm also in the @gdoron scenario, but I return a collection instead of a single record so I'm doing a foreach on the collection and something like
story.Comments = story.Comments.Where(x => !x.IsDeleted).ToList();
on each record.
I've tried this http://stackoverflow.com/questions/4720573/linq-to-entities-how-to-filter-on-child-entities/4724297#4724297 but it doesn't seem to work on EF Core. Is there a better way to do this?
You can query stories without including comments, then query separately comments and Explicit loading them: https://docs.efproject.net/en/latest/querying/related-data.html
Looks interesting, but I need to filter the first query to only return a record if there are any results in the specific navigation property, so I need to have the include or change the result of the first query when the explicit loading doesn't return anything which I think it's worse.
Yikes, don't know why this is sitting idle in a backlog. Should be considered for any upcoming version as per what the comments from @mikes-gh and @rendmath imply. Easy to overlook this one.
@atrauzzi well, this feature was and still is idle for years in EF, so I'm afraid ... 😢 Not sure why it's not prioritized.
Yes, this feature request has been around a while. I found it posted on UserVoice since 2010. It's crucial for allowing me to properly load my complex data model, so I hope it's soon given higher priority.
@rowanmiller @divega Can you please share with us when if ever this is going to be implemented? We designed our data structure in a way that is best practice regarding DB design, but heavily rely on this feature.
Consider this simplified scheme:
public class Post
{
public int Id { get; set; }
public string Text { get; set; }
public List<PostImage> Images { get; set; }
}
public class PostImage
{
public int Id { get; set; }
public bool IsDeleted { get; set; }
public Post Post { get; set; }
public int PostId { get; set; }
public string Url { get; set; }
public string CdnUrl { get; set; }
public ImageType Type { get; set; }
}
public enum ImageType
{
Normal = 0,
Primary = 1,
Header = 2,
Profile = 3
}
Now lets say I want to query 10 posts for my blog homepage with their single Primary image. Currently, there is no way of doing it. I will have to query for 10 posts with ALL of their images (even the deleted ones!) and only on the client filter out the useless data.
As our application is getting more sophisticated and gets more traffic, this is becoming a real and significant pain and we need a solution for this.
Is it going to have the same luck as the feature request on EF which was and still is idle for 6 years?
We really need an answer, there are solutions like denormalize our data model but that's rarely a good idea.
Thanks!
@gdoron we do want to do this, but it's not at the top of the list. Our Roadmap will give you an idea of the things that are going to be worked on next. You will see that this is listed under "High Priority", but just not the "Critical O/RM" list.
@rowanmiller I'm sure everyone has different points of view and needs but here are my two cents: Most of the things that are missing have somewhat reasonable workarounds.
e.g.
Lazy load - simply Include
upfront all your data.
View mapping- Manually create a migration.
SP Mapping - Use some technique as with View.
etc.
But Filtered Include has 0 reasonable workarounds. The only workaround is writing raw SQL but in many cases you need it for almost all of your queries, so that's not an option or else why use an ORM to begin with.
So reiterating what @mikes-gh wrote months ago:
Don't underestimate the value of filtered include. It was a top request in ef 6. I came across this requirement on my first ef 7 project. I'd take that over lazy loading any day.
I had already 3 projects on EF Core, and it was a requirement and a pain in ALL of them.
Just to be clear, the items on the roadmap are the very top of the 100s of feature requests we have sitting on the backlog. So it's inclusion on the list means that it is one of our top priorities. The split between "Critical" and "High Priority" is always subjective. The current split is based on our imperfect aggregation of the feedback from all our customers.
It's not as clean as true filtered loading support, but you can do something like this to do filtered loading. It will run two queries, but that is what EF Core would do under the covers anyway, when you load a collection.
var blogs = db.Blogs
.Where(b => b.Rating> 3)
.ToList();
var blogIds = blogs.Select(b => b.BlogId).ToList();
db.Posts.Where(p => blogsIds.Contains(p.BlogId))
.Where(p => p.Rating > 3)
.Load();
@rowanmiller
It will run two queries, but that is what EF Core would do under the covers anyway, when you load a collection.
I thought it's a temporary limitation that you're working on fixing. Are you telling me it is by design? Because it's not just two queries, it's two round trips to the DB.
Anyway, it might be acceptable for one collection loading, but when you have 10 included entities (we already have 6-7 in some places,), that means 11 round trips or querying the DB with 10 connections concurrently.
I might got you wrong, but if I didn't... Houston we have a problem.
cc @anpete and @divega who can provide some more context. But in general, the EF Core approach out performs the EF6.x approach pretty consistently.
I find it very hard to believe EF Core is or will be faster because of this approach. You might have fixed some internal slow parts that caused slow execution?
I believe several roundtrips will never be as fast as a single well written query. That just doesn't make any sense to me. I can ask my ex-CTO which is a SQL-Server MVP for his input if you like @rowanmiller.
@gdoron We never execute n+1 queries for Include. For your collection example, we would execute two queries, one for the parents and one for all of the children of those parents.
EF6 would execute one single left outer join query, which results in the dreaded cartesian product result set - The parent data is duplicated for each child row in the results.
@anpete that's a relief... 👍 😄 Thanks for clarifying!
This is probably obvious, but that's also what the workaround code I listed does. It's two queries, not n+1.
In the example you provided,
var blogs = db.Blogs
.Where(b => b.Rating> 3)
.ToList();
var blogIds = blogs.Select(b => b.BlogId).ToList();
db.Posts.Where(p => blogsIds.Contains(p.BlogId))
.Where(p => p.Rating > 3)
.Load();
how should I also include Blog.Images
? Do I need a 3rd query?
db.Images.Where(i => blogsIds.Contains(i.BlogId))
.Where(i => !i.IsDeleted)
.Load();
For your collection example, we would execute two queries, one for the parents and one for all of the children of those parents.
I don't see 1 query for parents and 1 for ALL childrens, instead I see one query PER children.
Am i missing something?
Not to mention about other type of filtering like Take
or Skip
... that makes the whole deal a lot uglier and complicated with GroupBy
afaik. https://github.com/aspnet/EntityFramework/issues/7033
I really think the most on point comment is:
Don't underestimate the value of filtered include.
@rowanmiller @anpete actually I'm not sure we understood each other correctly. I wasn't talking about N+1 where N = number of rows, but where N=number of included properties.
So in case I have a query like:
context.Post
.Include(x=> x.Images)
.Include(x=> x.Paragraphs)
.Include(x=> x.As)
.Include(x=> x.Bs)
.Include(x=> x.Cs)
.Include(x=> x.Ds)
.Where(x => x.Id == 1)
.ToList()
We'll have 1 query for the post, and another 6 queries for the navigation properties, am I right? If I am, that's bad even if it's not 6 round trips.
@anpete @rowanmiller Just had a phone call with a SQL-Server MVP, a summary of what he said:
EOF.
@gdoron i don't think it's that black or white, and pretty sure @rowanmiller (and the ef team) understands very much the internals of SQL and ORMs. I wouldn't even dare to question that.
I think the item being in high priority should be enough, they know it's important and they will add it. But for some it's critical and for others is high priority. This thread focused on make it a more priority item, not disusing the design whatsoever.
On my scenarios, even I would love to have filtered includes and I would use it on every project. It's not critically blocking me. High priority sounds reasonable, but I don't mind if it make to critical either. And actually, I'm more interested on the api consumption than on query performance.
pretty sure @rowanmiller (and the ef team) understands very much the internals of SQL and ORMs. I wouldn't even dare to question that.
I never did... and I apologize to the team if I was understood otherwise! It never hurts to hear another expert though.
@Bartmax
how should I also include Blog.Images ? Do I need a 3rd query?
Yes, you would need a third query. I'm definitely not saying that it's as good as native filtered loading, it does get pretty messy, it's just one possible way to tackle the problem while we do not have native support.
I don't see 1 query for parents and 1 for ALL childrens, instead I see one query PER children.
The second query starts by filtering to posts that belong to the blogs returned in the first query (.Where(p => blogsIds.Contains(p.BlogId))
). It then applies additional filters over that set of posts - in a single query.
@gdoron There are trade-offs here for sure. I would encourage you to do some measuring of your specific scenarios using both the EF6 and EF Core approaches. We would definitely be open to further improvements in this area if you uncover something compelling.
BTW, another aspect of this that I would love to see feedback on is what level of granularity is more interesting to prioritize:
IsDeleted
property. I don't think we have an issue for that, but the closest is #3932@divega I really like your implicit filtering idea! I use something similar with InsertDate:
private static void ConfigDefaultInsertDate(ModelBuilder builder)
{
builder.Model
.GetEntityTypes()
.SelectMany(e => e.GetProperties())
.Where(x => x.Name == "InsertDate" && x.ClrType == typeof(DateTime))
.ToList().ForEach(x =>
{
x.SqlServer().DefaultValueSql = "GetUtcDate()";
x.ValueGenerated = ValueGenerated.OnAdd;
});
}
And it saves us a lot of time with the mapping, and your idea would save a lot of time with querying and more importantly it can us from bugs where we forgot to filter the soft deleted rows which is the top concern on Stackoverflow about soft delete.
But had I had to choose only one feature, I would have chosen the explicit as it gives us more choices. I would rather having to write more but being able to filter all kind of stuff, like filter by the image Type property I mentioned earlier or by the the first row ordered by [Order]
e.g.:
context.Posts.Where(x=> x.Id == 1)
.Include(x => x.Images.Where(i =>!i.IsDeleted && i.Type == ImageType.Primary)
.Include(x => x.Paragraphs.Where(p =>!p.IsDeleted).OrderBy(p => p.Order).First())
.ToList()
Again, I really really like the implicit idea, but it doesn't let me do this.
Edit: Worth mentioning, Join in SQL based on first row isn't a trivial task, so I'm not sure it was a good example... filter by type is a better one.
I know there are a lot opposing opinions on this topic, so I want to add mine.
To understand where I'm coming from, a little bit of background: I've been doing SQL Server and EF training and consultancy for around 10 years (among others) and I've been involved in performance critical projects (like processing millions of rows a minute, tens of thousand of transactions a second, thousands of users concurrency, and lots more), also I've been using EF since EF 1 beta and all subsequent versions ever since, in major enterprise and commercial consumer applications. I even have a EF6 course "The pitfalls of EF and extracting the most performance out of it" (The course started during EF 4.1 time frame) that was very popular.
In EF6 an lower one of the things I've seen completely destroy the performance of application (over and over again) is Dreaded Cartesian join effect of Include when including multiple related collections
In a specific case where a list of products (thousands) was loaded and included Product Facets (thousands again) and included Categories and category facets (thousands), and Geographic locations , and location facets, etc (a graph of tens of thousands of items total) the Cartesian join produced by the include generated a result set of over 40 million rows that was processed, and execution time was almost a half an hour for that query only. A simple change of loading all the data with one query per level made the data load in a couple of seconds (under 3s).
In this particular case 20 or so queries with includes only for related entities (N:1) ad NOT related collections (1:N, M:N) were allot faster than a giant one (11K lines query and 40M+ result set) by 3 orders of magnitude..
Over the years, I've rewritten so many EF Linq queries and split them into multiple queries as performance work that I've lost count (and always with considerable effects).
Finally some years ago I've built a Load Graph API on top of EF6 that could be used to do a load a graph with one query per level (basically what EF Core does now), withouth the dreaded Cartesian join created by including collections and it would look something like:
var customers = GraphLoader.WithRoot<Customers>(dbContext)
.Where(c => c.Country == "US")
.LoadRelated(
c => c.Orders.Where(o => o.ShipViaShipper.CompanyName == "United Package"),
orders => orders.Include(o => o.Employee)
)
.LoadRelated(
c => c.CustomerCustomerDemos
).Load();
Because this was graph loading API it exposed it's own Linq methods, disallowing projections.
The best thing is that this API can be implemented very easily in a few hundred lines of code:
public static class GraphLoader
{
public static Graph<T> WithRoot<T>(DbContext context)
where T : class
{
return new Graph<T>(context.Set<T>());
}
public static Graph<T> LoadRelated<T, TU>(this Graph<T> graph, Expression<Func<T, TU>> related)
where T : class
where TU : class
{
graph.additionalQueries.Add(graph.root.Select(related));
return graph;
}
public static Graph<T> Include<T, TU>(this Graph<T> graph, Expression<Func<T, TU>> related)
where T : class
where TU : class
{
graph.root.Include(related);
return graph;
}
public static Graph<T> LoadRelated<T, TU, TV>(this Graph<T> graph, Expression<Func<T, IEnumerable<TU>>> related, Func<Graph<TU>, Graph<TV>> loadRelated)
where T : class
where TU : class
where TV : class
{
var newQuery = graph.root.SelectMany(related);
var newGraph = new Graph<TU>(newQuery);
loadRelated(newGraph);
graph.additionalGraphs.Add(newGraph);
return graph;
}
public static Graph<T> Where<T>(this Graph<T> graph, Expression<Func<T, bool>> filter)
where T : class
{
graph.root = graph.root.Where(filter);
return graph;
}
public class Graph<T> : Graph
where T : class
{
internal IQueryable<T> root;
public Graph(IQueryable<T> source)
{
this.root = source;
}
public virtual List<T> Load()
{
LoadAdditional();
return root.ToList();
}
public override void LoadRoot()
{
root.Load();
}
}
public class Graph
{
internal List<IQueryable> additionalQueries = new List<IQueryable>();
internal List<Graph> additionalGraphs = new List<Graph>();
public virtual void LoadRoot()
{
}
public void LoadAdditional()
{
foreach (var additionalQuery in additionalQueries)
{
additionalQuery.Load();
}
foreach (var graph in additionalGraphs)
{
graph.LoadRoot();
graph.LoadAdditional();
}
}
}
}
This is an incomplete version, but if someone shows interest I could make it a library.
@popcatalin81 great feedback, my concern is the API to do the filtered include. Performance wise, I expect the EF team to come up with what's best. 1 superbig query or many small ones, i really don't care, that's exactly why I use an ORM, to abstract that from code. And IF there's a performance implication, I can dig deeper.
@popcatalin81 GREAT feedback indeed, and thanks for sharing.
Curious to know, when you wrote "slow execution", was it on the DB side or on the network side or on the IIS server side?
Two notes though:
Since you clearly have much more XP and knowledge than me, do you you agree that on less returned rows (more common) queries EF 6 approach was faster?
Thanks again!
@gdoron, the slow execution was also on the DB side due to "table spools" (buffering data in tempdb) and on the network side (gigabytes of data transferred for the giant query)
Note: One scenario where table spools are created is when not using Read Commited Snaposhot and the number of read locks that needs to be taken for the data set size far exceeds the max, which was this case.
@popcatalin81 yeah, that makes sense with huge queries. What about "normal" much smaller queries?
Anyway, I wish EF Core could provide an API where the developer can send a boolean flag that tells EF which strategy to use.
Something like:
public List<T> ToList(bool splitQuery = false)
Currently I have no way of combing the queries into one query unless I manually craft a SQL query and execute it with FromSql
which is unrealistic.
@gdoron
Currently I have no way of combing the queries into one query unless I manually craft a SQL query and execute it with FromSql which is unrealistic.
You can always use do something like this:
from c in context.Customers
join o in context.Orders on c.CustomerID equals o.CustomerID into orders
select new { c, orders };
which produces this:
SELECT [c].[CustomerID], [c].[Address], [c].[City], [c].[CompanyName], [c].[ContactName], [c].[ContactTitle], [c].[Country],
[c].[Fax], [c].[Phone], [c].[PostalCode], [c].[Region], [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate]
FROM [Customers] AS [c]
LEFT JOIN [Orders] AS [o] ON [c].[CustomerID] = [o].[CustomerID]
ORDER BY [c].[CustomerID]
Of course, this requires using a tracking query so that we get fix-up. We can make it work for no-tracking with a small hack:
var q = (from c in context.Customers
join o in context.Orders on c.CustomerID equals o.CustomerID into orders
select new { c, _ = SetOrders(c, orders) }).AsNoTracking();
private static object SetOrders(Customer c, IEnumerable<Order> orders) => c.Orders = orders.ToList();
Just to make it clear, that also for IEnumerables (1:N) so as for N:1 Includes a possibility to filter (extend ON clause on T-SQL level via Linq) is a must.
Chiming in as an ordinary C# dev who has had the exact same issues as @popcatalin81 and even made use of a custom API similar to his GraphLoader
. Great to see that others understand this.
Another dev who is also holding out hope for filtered includes in a future release.
I came across DynamicFilters, but unfortunately it doesn't support EF Core :(
I also found EntityFramework-Plus, but again no support for EF Core :(
Another hope was for command interceptors where I could manipulate the SQL before it went out for execution, but that's not available yet and discussed in issue https://github.com/aspnet/EntityFramework/issues/626.
Although the Roadmap includes the following under CRUD heading:
There's no guidance on whether they'll be included in EF Core 1.2 release. If the EF team has higher priorities is there a possibility they can somehow lend support to one of the projects mentioned above?
@mguinness command interception is an another concept. filtering on includes can be "replaced" if you are willing to give up comfort by using automapper on the original iqueryable
@hidegh I understand command interception is another concept, but it allows for workarounds when you don't have filtered include. I see three 3 needs for filtered includes, but I'm sure there are others:
Using command interception you could implement these by altering the SQL before it gets executed (not pretty but effective). Leaving SELECT untouched would mean no impact to EF processing, rather you would modify the JOIN constraint:
LEFT JOIN Post AS p ON b.BlogId = p.BlogId
would change to
LEFT JOIN Post AS p ON b.BlogId = p.BlogId AND 0 = p.Deleted
Maybe I'm missing something obvious, but I don't see how you can do this when .Include() is used.
OK, so I ran into this today, and also need this feature. Is there a latest on when this might be included in EF?
+1 for Where condition on Include!! Similar to MGuinness, i need to be able to perform security filtering at the user level; I have a RiskRegister object with FK link to ApplicationUser, & want to be able to return all RiskRegisters for logged in user.
We keep this issue to track specifying filters inline when you specify eager loading in a query. However there are many scenarios that can be satisfied with global query filters:
Please learn about global query filters
We are seeing that a large portion of the scenarios customers want this feature for can already be better addressed using global query filters. Please try to understand that feature before you add your vote to this issue.
We are keeping this issue open only to represent the ability to specify filters on Include on a per-query basis, which we understand can be convenient on some cases.
Original issue:
Doing a
.Where()
inside a.Include()
is allowed (by the syntax, it fails in execution). However, if you then follow that up with a.ThenInclude()
, like so:You get the following error:
Is this something you're aware of and going to allow in a future version, or am I just doing something wrong and It's already supported?