ErikEJ / erikej.github.io

ErikEJ blog
3 stars 1 forks source link

Avoiding SQL Server plan cache pollution with EF Core 3 and Enumerable.Contains() #2

Open ErikEJ opened 4 years ago

roji commented 4 years ago

Any reason to generate so many ORs rather than to just put the parameter placeholders in the IN parentheses?

ErikEJ commented 4 years ago

Just because I would not know how to do that! Do you think it makes any difference performance wise?

joelweiss commented 4 years ago

Do you think it makes any difference performance wise?

this says no

ksks88 commented 4 years ago

Hey @ErikEJ,

Do you think it would be safe to remove the condition for checking argument length for Postgresql (Npgsql)?

Good job BTW!

ErikEJ commented 4 years ago

Yes, looks like some source say short.MaxValue is the limit, but not sure about the ADO.NET provider - @roji ?

roji commented 4 years ago

Sorry, am missing some context... What exactly are we talking about here? :)

ksks88 commented 4 years ago

https://erikej.github.io/efcore/sqlserver/2020/03/30/ef-core-cache-pollution.html

roji commented 4 years ago

Oh, OK. Yeah, PostgreSQL supports up to ushort.MaxValue (65535) parameters for each statement - but a batched command can have more than 65535 parameters (as long as no single statement does). What current argument length check do you have?

ErikEJ commented 4 years ago

Current max for SQL Server is 2048, so it can be much higher for postgress

roji commented 4 years ago

Yep, absolutely.

ransagy commented 2 years ago

Sorry for raising the dead here a bit, But im trying to put it into context and this seemed a better location than the EF Core issue for it (if you think otherwise, let me know and ill mirror there):

You chose to update only the "bucketed" solution from divega's proposals - Is that solely because its more globally supported by the various DB vendors? Strictly in SQL Server (and Azure SQL), do you think the string_split or TVP solutions are preferable in the context of plan cache pollution?

I'm specifically more worried about using FromSql/FromSqlInterpolated if i can avoid it entirely.

ErikEJ commented 2 years ago

I am actually writing a blog post about using string_split just now.

This solution is provider agnostic, string_split SQL Server only.

And when I wrote the code, TVF mapping was not available in EF Core, it is now, so no FromSql needed!

ransagy commented 2 years ago

Then i'll be looking forward for that article and will take another look at TVFs (which i thought not relevant in this context for some reason). Thanks!

ErikEJ commented 2 years ago

I just blogged: "Avoiding SQL Server plan cache pollution due to unparameterized Contains queries generated by EF Core" https://erikej.github.io/efcore/sqlserver/2021/11/22/efcore-sqlserver-cache-pollution.html

joelmandell commented 2 years ago

I had to use the IQueryableExtensions.In(), to support older SQL Server versions. And had an issue when using a library that parses the table name from generated query (I am using multiple Interceptors). The issue is when IEnumerable<TKey> values is empty.

It has the check if(!values.Any()) and returns queryable.Take(0). When that call is used EF Core generates an subquery, and the library that I use (EFCoreSecondLevelCacheInterceptor) fails trying to fetch table name from query.

I resorted to return queryable.Where(x => true) instead. And that seems to work after that. Posting this in case someone else has this problem.

Updated code of that extension method, according to that change:

public static IQueryable<TQuery> In<TKey, TQuery>(
            this IQueryable<TQuery> queryable,
            IEnumerable<TKey> values,
            Expression<Func<TQuery, TKey>> keySelector)
        {
            if (values == null)
            {
                throw new ArgumentNullException(nameof(values));
            }

            if (keySelector == null)
            {
                throw new ArgumentNullException(nameof(keySelector));
            }

            if (!values.Any())
            {
                //.Where instead of .Take(0), cause that seem to produce "funky" SQL.
                return queryable.Where(x => true);
            }

            var distinctValues = Bucketize(values);

            if (distinctValues.Length > 2048)
            {
                throw new ArgumentException("Too many parameters for SQL Server, reduce the number of parameters", nameof(keySelector));
            }

            var predicates = distinctValues
                .Select(v =>
                {
                    // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                    Expression<Func<TKey>> valueAsExpression = () => v;
                    return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                })
                .ToList();

            while (predicates.Count > 1)
            {
                predicates = PairWise(predicates).Select(p => Expression.OrElse(p.Item1, p.Item2)).ToList();
            }

            var body = predicates.Single();

            var clause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);

            return queryable.Where(clause);
        }
Henery309 commented 2 years ago

I really like the solution proposed but can't seem to get the query below working. I also tried using String_Split solution but that defaults to nvarchar(4000) converting to varchar kills the db :) as db I am working on is using varchars

var childItemFilter = List{"childItem1","childItem2"}; itemQuery = itemQuery.Where(m => m.ChildItems.Any(s => childItemFilter.Contains(s.childItemReference)));

ErikEJ commented 2 years ago

@Henery309 thanks for the kind words.

I do not see any attempt to use the In extension method in the code Shared. Please share a full repro and I will have a look.

Also, what does not working mean?

Henery309 commented 2 years ago

Thanks for helping Erik appreciate it a lot.

I have the following query which retrieves all projects that are assigned to staff members Because I am using a contains in the linq query sql is generating 1 sql plan for every staff member.

var dbcontet = new ProjectDbContext(); var query = dbcontext.Projects.Where(m=> m.StaffMembers.Any(s=> staffReferences.Contains(s.StaffReference)));

I tried using the extension method IN: query = query.In(staffReferences, s => s.StaffMembers.SelectMany(s => s.StaffReference));

But this fails with the following error message: System.InvalidOperationException: 'The LINQ expression 's => s.StaffReference' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.'

ErikEJ commented 2 years ago

@Henery309 I need a complete runnable repo in order to help, not code fragments

Henery309 commented 2 years ago

@ErikEJ have prepared a running example: https://github.com/Henery309/EfCoreTest The runnable code uses sqllite as the database and outputs the generated sql to the console.

Using the following code: var projs = db.Projects.In(staffIds, m => m.StaffMembers.Single().Id).ToList();

The output I get is:

SELECT [p].[Id], [p].[Name] FROM [dbo].[Project] AS [p] WHERE (( SELECT TOP(1) [s].[Id] FROM [dbo].[StaffMember] AS [s] WHERE [p].[Id] = [s].[ProjectId]) = @__v_0) OR (( SELECT TOP(1) [s0].[Id] FROM [dbo].[StaffMember] AS [s0] WHERE [p].[Id] = [s0].[ProjectId]) = @__v_1)

multiple selects are added to where clause

I am trying to figure out how to get:

SELECT [p].[Id], [p].[Name] FROM [dbo].[Project] AS [p] WHERE EXISTS ( SELECT 1 FROM [dbo].[StaffMember] AS [s] WHERE ([p].[Id] = [s].[ProjectId]) AND ([s].[Id] = @param1 OR [s].Id == @param2))

ErikEJ commented 2 years ago

@Henery309 Hmmm... This is made to solve a SQL Server specific issue with hardcoded paramters, and I see no hardcoded values in the SQL your are sharing. If you use SQLite, this may not be needed at all.

I think SQL Server does not have issus with the multiple SELECTs. I am not sure which implementation if the In method you are using, but there are some suggesions for collapsing ORs in the comments in the gist.