Open aradalvand opened 1 year ago
Since the two distinct subqueries are present as-is in your original LINQ query (l.Course.Ratings.Average(r => r.Stars)
), you're effectively asking EF to deduplicate them, i.e. to optimize your LINQ query.
Have you tried first selecting out to an intermediate anonymous object which performs the Average, and then to select out again for the DTOs? That would deduplicate the Average operation at the LINQ level.
Have you tried first selecting out to an intermediate anonymous object which performs the Average, and then to select out again for the DTOs? That would deduplicate the Average operation at the LINQ level.
@roji Presumably that's possible but it would make the type-checking of the Lesson
awkward and difficult, which is in a way the whole point of this query.
It's also an attempt to work around something that I believe should be solved at a lower level.
EF already de-duplicates simpler expressions like l.Title
(as I mentioned earlier), so why not extend that to all expressions? Makes sense I think.
Presumably that's possible but it would make the type-checking of the Lesson awkward and difficult, which is in a way the whole point of this query.
How would it make it more difficult? Lesson is the abstract base class here, and Course is defined on it:
db.Lessons
.Select(l => new {
Lession = l,
CourseRating = l.Course.Ratings.Average(r => r.Stars)
})
.Select(l =>
l.Lession is ArticleLesson
? new ArticleLessonDto
{
Title = l.Title,
CourseRating = l.CourseRating
Text = ((ArticleLesson)l).Text,
}
: l is VideoLesson
? new VideoLessonDto
{
Title = l.Title,
CourseRating = l.CourseRating
Url = ((VideoLesson)l).Url,
}
: null)
In fact, regardless of EF, I'd have advised rewriting your original LINQ query above to avoid repeating the CourseRating expression twice (duplication is always a bad idea). Note that I haven't tested whether this solves the problem on the SQL side, but I suspect it does.
More generally, when LINQ queries are written in a less condensed/efficient form, EF can't always optimize out things and de-duplicate everything for you; that requires both more complexity in the query pipeilne code and possibly affecting query compilation performance as well. We sometimes do this when it's easy/cheap, but we don't generally aim to do so in all cases; this may be what's happening with l.TItle vs. the more complex query you're trying to execute.
How would it make it more difficult? Lesson is the abstract base class here, and Course is defined on it:
@roji I'm writing a GraphQL-to-LINQ transformer and something like that would make the implementation far more complex, I also have to account for navigation properties and whatnot, which I've found don't always play nicely with intermediate projections like that. I know that's a me problem but I'm just saying that it would've been more reasonable if the standard query just yielded the optimal SQL in the first place.
In fact, regardless of EF, I'd have advised rewriting your original LINQ query above to avoid repeating the CourseRating expression twice (duplication is always a bad idea). Note that I haven't tested whether this solves the problem on the SQL side, but I suspect it does.
That would be true if I were writing those queries by hand but they're being auto-generated. Also, someone could be using a tool like EntityFrameworkCore.Projectables or LINQKit where you're not actually repeating expressions in your source code, but it gets repeated once it's all fed to EF.
More generally, when LINQ queries are written in a less condensed/efficient form, EF can't always optimize out things and de-duplicate everything for you; that requires both more complexity in the query pipeilne code and possibly affecting query compilation performance as well. We sometimes do this when it's easy/cheap, but we don't generally aim to do so in all cases; this may be what's happening with l.TItle vs. the more complex query you're trying to execute.
Understandable. But I would argue EF should actually aim to do so generally. I'm not sure the query compilation performance would be affected to any significant degree by something like this, also given that queries are cached, and usually compiled only once, I would say the benefits of deduplication outweigh its costs, especially as things like the libraries I mentioned above and similar solutions for expression/query "composability", if you will, become more widely adopted.
I'm writing a GraphQL-to-LINQ transformer and something like that would make the implementation far more complex [...]
I understand and sympathize, but at the end of the day you're asking a lower layer (EF) to take a sub-optimal input which you give it, and make it optimal. I'm not sure exactly how you're generating the LINQ tree, but I'm not sure EF is in a better position to optimize your tree after it's produced, compared to you. In general, it's always better to produce something optimized at the source, rather than producing something less optimized and expecting later components to somehow piece it together and make it better again.
Also, someone could be using a tool like EntityFrameworkCore.Projectables or LINQKit where you're not actually repeating expressions in your source code, but it gets repeated once it's all fed to EF
If those tools produce sub-optimal trees, I'd argue that they need to be improved (just like whatever you're doing to transform GraphQL to LINQ), rather than trying to make EF compensate after the fact.
This is a bit similar to writing bad C# code, and expecting the compiler/JIT to always emit optimized binary code. Although in some simple cases the compiler can see through bad code and yield optimal results, it's ultimately unfeasible/non-scalable to expect it to always do that.
Given this model:
If you were to run a query like this, where you're checking the type of the lesson, and then projecting into the corresponding DTO type
This would yield the following SQL:
The problem, as you might be able to immediately tell, is that the sub-query for the
CourseRating
is repeated twice.It may be worth noting that if you selected the
Title
property on theCourse
instead, this wouldn't happen:There's going to be only a single occurrence of
c."Title"
in the query, as expected:EF Core version: 7.0.0 Database provider: Npgsql Target framework: .NET 7.0 Operating system: Ubuntu 22.04 IDE: VS Code
I initially created an issue in the Npgsql repo but was told that this is related to EF Core itself.