dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.66k stars 3.16k forks source link

Tracking Down SQL Cache Plan Issue #28387

Closed Mike-E-angelo closed 2 years ago

Mike-E-angelo commented 2 years ago

Ask a question

I am currently tracking down an issue that I am observing via EFCore logging. I am unclear at this point if it is related to a control vendor (Syncfusion) or if it's an EFCore issue, or a combination of the two via integration by myself. I am opening up a question here for better understanding to get me oriented/situated accordingly to take the next steps.

If I understand correctly, each different SQL query emitted is a different compilation, execution, and storage on the SQL side. So this query...

SELECT [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_0 ROWS ONLY

... is different from the one below, and is executed/stored/cached as such (note the difference between NEXT @__p_0 ROWS and NEXT @__p_1 ROWS):

SELECT [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

Essentially, as I understand it, the goal is to ensure generated queries are as exact as possible. Any difference results in a new compilation and executed plan, which takes up resources on the SQL Server side, which is to be avoided.

To better explain my understanding, the hash of the query is what is used as the key of the query to pull for subsequent calls to the same query. Any difference will result in a cache miss and the subsequent generation of a new query.

The problem I am encountering here is that for the same "query" attached to a grid control, I am seeing three different SQL statements when I page through the control. As such, I am trying to track down exactly where this is happening and how to consolidate everything into one query so that it is only generated once on the SQL Server side.

These are the queries that I am seeing being emitted via EFCore logging:

SELECT TOP(@__p_0) [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
SELECT [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_0 ROWS ONLY
SELECT [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

Include your code

Use this solution: https://github.com/Mike-E-angelo/Stash/tree/master/Syncfusion.LibraryManagement

  1. Use the /index page to seed the database with 5 entries (the grid in the next step is PageSize=2 so that is 3 pages).
  2. Use the /query page to page through the pages created in the previous step and generate the SQL statements in the debug/console window

Note that for the /query page, the items will not display, but the paging elements will. I could not figure out why the items would not display but since the paging elements are what is necessary/important here, I considered this good enough for reproduction purposes.

After the initial load on /query, paging to page 1, and paging to page 2, you should see the 3 different queries emitted out in the Output (debug) console window as seen above.

Please let me know if you encounter any issues with the above and I will do my best to further assist you.

Again the goal here (as I understand it) is to consolidate everything into one query e.g. the below:

SELECT [b].[Id], [b].[Author], [b].[Available], [b].[Name], [b].[Price], [b].[Quantity]
FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

Additionally, if I have something fundamentally misunderstood here, that would be valuable to know as well.

Finally, for additional context, the discussion can be found here on Syncfusion's side: https://www.syncfusion.com/forums/175921/sample-efcore-sfgrid-net6-0-sln

Thank you for any assistance you can provide and for all your great work out there. 🙏

Include stack traces

NA

Include verbose output

NA

Include provider and version information

EF Core version: 6.0.5 Database provider: (e.g. Microsoft.EntityFrameworkCore.SqlServer) Microsoft.EntityFrameworkCore.SqlServer Target framework: (e.g. .NET 5.0) net6.0 Operating system: Windows 10 IDE: (e.g. Visual Studio 2019 16.3) Visual Studio 17.3.0 Preview 2

roji commented 2 years ago

Your understanding is correct - query plans are generally cached based on SQLs, so any change in the SQL would in principle mean a different query plan.

Now, EF's input is a LINQ expression tree, which eventually determines which SQL gets generated. In other words, if you're seeing different SQLs, that means that someone (SyncFusion in your example) is handing EF different queries. To know exactly why that is, you're going to have to look into SyncFusion itself, assuming it's the one generating your LINQ query tree.

However, stepping back a bit... Having multiple query plans in itself is not necessarily a problem, and trying to reduce the number of different SQLs by all means necessary is probably a case of unnecessary premature optimization. In other words, the above three SQLs seem perfectly reasonable to me, and reducing them from 3 to 1 is unlikely to have much impact on your overall performance. In theory, if your application generates a huge number of different SQLs, then reducing a bit here and there may help, but I definitely would not look into this without clear proof on the SQL Server side that you have too many SQLs.

Query plan explosion is indeed a real problem in some other cases. For example, when users dynamically build their own expression trees, they sometimes make the mistake of introducing constants into the SQL:

FROM [Book] AS [b]
ORDER BY [b].[Id]
OFFSET 3 ROWS FETCH NEXT 4 ROWS ONLY

This means that a new SQL (and therefore plan) is generated for each value set, which is indeed a big problem; this is quite different from your case, where 3 SQLs cover all the possible value sets (because of proper parameterization).

To summarize, the situation you're describing above seems very normal to me for most applications, and I wouldn't advise trying to reduce the 3 SQLs into 1. If you think you're an exceptional case, I'd first prove that by inspecting SQL Server plan statistics before actually doing anything, to avoid optimizing something which does not need to be optimized.

Mike-E-angelo commented 2 years ago

Thank you very much for the quick, informative, and valuable reply @roji. 🙏 To be certain, this is three generated queries per-grid-control. Doing a quick search of my solution, I have 25 of these controls in my application, so a total of 75 of these generated queries are possible once the application is finally stood up and utilized.

I am fully OK with relegating this to premature optimization and attend to this when absolutely necessary, but wanted to point the above fact to be 100% certain. I am a total newb when it comes to SQL performance -- especially in Azure where I basically followed the provided wizard to create my DB and eyeballed the requirements. If the 75 seems like a nominal count then this sounds alright with me. Otherwise, please let me know and we can consider next steps, if necessary.

As it sounds like this question has been answered, I will be closing it for now. Thank you again for your time and knowledge.

Mike-E-angelo commented 2 years ago

Looks like the limit is 160K+ entries? If so, 75 is a drop in the bucket. https://www.sqlskills.com/blogs/erin/sql-server-plan-cache-limits/

By default, the plan cache is limited to 160,036 total entries

roji commented 2 years ago

I wouldn't necessarily assume the 160K+ number applies to Azure SQL (note that this would also be dependent on e.g. memory resources on your instance etc.). But I'm pretty sure 75 is completely reasonable and that you shouldn't worry about it.

ErikEJ commented 2 years ago

I believe the max number is fixed for any SQL Server edition, including Azure.