dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.75k stars 3.18k forks source link

Dynamic OR predicate on multiple columns #32092

Closed clement911 closed 1 year ago

clement911 commented 1 year ago

In EF core, is it possible to have WHERE with a multi-column OR predicate that is based on a dynamic list?

Let me explain with some simple pseudo code.

Assume I need to fetch a list of orders, based of a list of order ids.

I can do as follows:

public class Order
{
    [Key]
    public long Id {get;set;}

    //other properties...
}

long[] orderIds = //array of ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains(o.Id)).ToArrayAsync();

That works great.

Now, imagine that the order class has a composite key made of multiple columns.

I would like to write something like this:

public class Order
{
     [Key]
     public long TenantId {get;set;}

    [Key]
     public long Id {get;set;}

    //other properties...
}

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync(); //this doesn't work

However, this doesn't work. Is there an alternative?

As of now, we generate sql manually and use FromSqlRaw. That's not ideal though.

ilmax commented 1 year ago

I was also running into this the other day, I think it's a missing piece in Linq. It would be nice to have some sort of composable Any so we could so something like the following pseudo code:

var query = something... foreach (var item in list) { query = query.Any(x => x.Id = item.Id && x.TenantId = item.TenantId) }

As of today we're forced to take some workarounds to build such queries like for example manually constructing expression trees.

The code above doesn't work today because Any doesn't return an IQuerybale<T> but a bool, that's why to me it looks like something missing in Linq itself.

Maybe I'm missing something obvious but I'm not aware of how we can express a list of OR conditions in Linq today.

clement911 commented 1 year ago

I've seen people mention this PredicateBuilder but it's not clear if it's supported. I'd prefer to see a native EF solution. It seems like a pretty basic scenario.

roji commented 1 year ago

Duplicate of #11799

roji commented 1 year ago

We've made various progress around Contains that goes in this direction, but full support for tuples (which the above requires) isn't yet supported.

As of now, we generate sql manually and use FromSqlRaw.

@clement911 what SQL are you currently using for the above?

@ilmax I'm not sure what's missing in LINQ here. First, the above - Contains with tuple - is already the right LINQ expression for the operation; the problem is that EF doesn't yet support translating it (because support is lacking for tuples in general).

When we implement support for this, the array of tuples parameter ((long tenantId, long orderId)[]) would likely be encoded as a JSON string and send to the database, to be unpacked into a relational table using e.g. SQL Server OPENJSON. At that point it can be treated like a regular table, using the regular SQL means to translate the operation.

ilmax commented 1 year ago

I'm not sure what's missing in LINQ here

@roji I think I described my wish in the comment above. I'd like the ability to construct multiple conditions and then OR- them together, same thing we can do with the Where but using OR instead of AND

The contains with tuple to me just doesn't feels good. It's difficult to read compared with my pseudo proposal above (but this is my personal opinion) and it's also more limited than what I was proposing above.

roji commented 1 year ago

@ilmax it sounds like you're asking for #15670.

If so, that approach generates different SQLs for arrays of different size (since there are more ORs); this is bad, since it defeats query plan caching at various levels and hurts performance considerably (see https://github.com/dotnet/efcore/issues/15670#issuecomment-1560749092 for more on this).

For that reason, I definitely think that's not the right solution here. Packing the parameter as a JSON parameter and expanding it to a table in the database is the better way to do this.

The contains with tuple to me just doesn't feels good. It's difficult to read compared with my pseudo proposal above (but this is my personal opinion) and it's also more limited than what I was proposing above.

Well, "doesn't feel good" isn't a very strong argument... The contains with tuple works exactly the same way as the contains without a tuple, so I'm not sure why anyone would find it difficult to read. More importantly, your proposal doesn't show what the query would actually look like, in code. If you're asking for us to add some new, unknown LINQ construct for this, I definitely don't see us doing that when a perfectly suitable LINQ construct already exists.

ilmax commented 1 year ago

isn't a very strong argument

I agree, that's why I said it's just my personal opinion, the second point about limitation is more important and it should have been added as first.

Having said that and ignoring the perf side of thing for now, I was saying that to me looks like a Linq limitation because I feel (but I may be wrong) that's not easy with Linq today to compose several query with OR because Any has no deferred execution semantics so we can't compose query with it.

So what I feel like is missing is the ability to compose multiple conditions together with an OR. this is possible today in Linq to object but not always properly translated by various providers (not talking only about EF core here and that's my point about exposing a simpler method in Linq and risking to go O.T. here).

Take for example this trivial linq 2 object code that's just for illustration purposes:

var customers = new List<Customer>();
for (int i = 0; i < 20; i++)
{
    customers.Add(new Customer(i, i % 5, i*2));
}

var customersToFind = new List<(int Id, int TenantId)>
{
    {(8, 4)},
    {(9, 4)},
    {(14, 4)}
};

var found = customers.Where(i => customersToFind.Any(x => i.TenantId == x.TenantId && (x.Id == i.Id || x.Id == i.AlternateId))).ToArray();

public record Customer(int Id, int TenantId, int AlternateId);

If we want to translate the same query to EF or other Linq provider (looking at you, Cosmos SDK) out there, we're mostly out of luck.

EF today (tested with the latest 8.0-RC2 version available, see gist here) can't translate a similar query. It would be nice to be able to do something like the following and have the same semantics of the linq to object query expressed above (name Or is just the first one that came to my mind):

var customerQuery = customers.AsQueryable();
foreach (var custId in customersToFind)
{
    customerQuery = customerQuery.Or(c => c.TenantId == custId.TenantId && (c.Id == custId.Id || c.AlternateId == custId.Id));
}

In order to express such functionality today and not pulling in 3rd parties libraries (unless I'm missing something) we have to take a stroll into manually creating expression trees and that's not trivial and, if memory serves me right, it was discouraged by the EF team itself.

We can argue that's probably not the best way to query perf wise and that you should have a better db design to avoid such queries and everything else but, every now and then, those query came up and today I always end up with manually created expression trees.

I hope I expressed myself in a clearer way now :)

roji commented 1 year ago

So again, I think you're really conflating two things:

  1. The ability to express something with LINQ.
  2. The ability of a specific LINQ provider to translate that LINQ.

I'm seeing nothing in this issue that can't be expressed with LINQ - via Contains and tuples; this works perfectly with LINQ to Objects, and IMHO is also the ideal and compact way to express the operation. Then there's the question of specific LINQ providers (and/or EF providers) actually translating that construct; that's really a separate question. Note that there are various other unrelated LINQ constructs that EF can't translate to SQL.

I've already written above why generating dynamic SQL is a bad idea (bad perf because of how SQL plan caching works). I suggest fully understanding that; it should make clear why your proposal wouldn't be a good way forward.

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

ilmax commented 1 year ago

So again, I think you're really conflating two things:

  1. The ability to express something with LINQ.
  2. The ability of a specific LINQ provider to translate that LINQ.

I'm well aware of how Linq and EF works, that was why I was thinking about exposing it in Linq itself (even though I'm aware that the Linq library has a very high bar to accept any changes so I knew my idea would go nowhere, I just wanted to see if there's a need of such functionality)

I'm seeing nothing in this issue that can't be expressed with LINQ - via Contains and tuples; this works perfectly with LINQ to Objects, and IMHO is also the ideal and compact way to express the operation. Then there's the question of specific LINQ providers (and/or EF providers) actually translating that construct; that's really a separate question. Note that there are various other unrelated LINQ constructs that EF can't translate to SQL.

I'm not sure I agree on the ideal part. What I propose has far superior readability (subjective point) and can be composed easily, I also think that's probably easier for a linq provider to translate since it doesn't have to understand the reference a collection and translating the Any()

I've already written above why generating dynamic SQL is a bad idea (bad perf because of how SQL plan caching works). I suggest fully understanding that; it should make clear why your proposal wouldn't be a good way forward.

I've read the comments and understand the drawbacks, but we don't live in a perfect world and sometimes those kind of trade-offs have to be made

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

I agree, everything can be done today but it's impractical. What I was hoping was to simplify what to me seems like a common enough case to deserve to be treated as a first class citizen.

Given that I've failed to prove my case, I will live with my manually constructed expression trees :)

clement911 commented 1 year ago

@roji We use the following SQL work around. (I simplified the code but you should get the idea).

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
 string orPredicate = orderIds.Select(i => $"({nameof(Order.TenantId)} = {i.tenantId} AND {nameof(Order.Id)} = {i.orderId})")
                          .Join(" OR ");
 var orders = await ctx.Set<Order>()
                                   .FromSqlRaw($"""
                                                     SELECT *
                                                     FROM [{SCHEMA_PUBLIC}].[{nameof(Order)}]
                                                     WHERE ({orPredicate})
                                                     """)
                                  .ToArrayAsync();

But of course we don't like because it's brittle, verbose, and we lose many of the benefits of querying with EF. Hence why we would prefer if EF had a built-in feature to support this properly.

Using a linq query like below would be ideal.

var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync();

My understanding is that Postgres natively supports tuple comparison. It supports not only (in)equality operators but also other comparison operators such as greater than, lower than, etc... For example, it is valid to write a postgres query like this:

select * from books WHERE 
(title, id) > ('KVFNdl5F', 994364)

Unfortunately, we use sql server and sql server doesn't have native support for tuples. That's why we have to create a query with an OR clause.

You mentioned that EF would likely encode the array of IDs as a JSON string and send to the database, to be unpacked with OPENJSON? That seems a lot more complicated to me that generating a plain OR clause. I agree that this means that the query will not benefit from query plan caching since the sql will be different each time. However, this is no different that the regular single-value contains, right? i.e if today I write await ctx.Orders.Where(o => orderIds.Contains(o.Id)), EF will generate an IN statement with all the values, which means that the sql will be different each time. So it would make sense to me that the tuple version uses an OR, which is the closest equivalent to an IN.

But to take a step back, we don't care too much about the actual generated SQL, as long as it performs well.

It hadn't occur to me to use OPENJSON. I do think that @ilmax direction is more flexible though. Indeed, Contains is good for equality and !Contains for inequality but sometimes we need more complicated logic.

For example, imagine that the user provides a list of prefixes and the query needs to return all orders for which the product name starts with any of the supplied prefixes. Contains will not help but it would be great to be able to write something like this.

var orPredicate = EF.Or<Order>();
foreach (string prefix in prefixes)
  orPredicate = orPredicate.Or(o => o.ProductName.StartsWith(prefix);
var orders = ctx.Orders.Where(o => o.TenantId = tenantId && orPredicate).ToArrayAsync();

I know that we can build expression trees by hand but who does? It's very tricky to do. If EF provided a built-in utility such as PredicateBuilder, it would add a ton of flexibility to build arbitrary complex tree of AND/OR condition trees that can operate on multiple columns using the full power of EF.

If I understand correctly, the EF sql generation pipeline wouldn't even need to change, since it wouldn't care that the expression tree was built using a utility rather than with a 'normal' query. So it seems like it would be a quick win.

While I agree that Contains on tuples is a different feature to an OR/AND predicate builder utility, I do think that the latter would allow us to handle the former, and more.

roji commented 1 year ago

You mentioned that EF would likely encode the array of IDs as a JSON string and send to the database, to be unpacked with OPENJSON? That seems a lot more complicated to me that generating a plain OR clause. I agree that this means that the query will not benefit from query plan caching since the sql will be different each time. However, this is no different that the regular single-value contains, right? i.e if today I write await ctx.Orders.Where(o => orderIds.Contains(o.Id)), EF will generate an IN statement with all the values, which means that the sql will be different each time. So it would make sense to me that the tuple version uses an OR, which is the closest equivalent to an IN.

The Contains behavior is changing in 8.0, precisely because of the query plan issues inherent in varying SQL, see this blog post for more details. I think that in principle, it's best if EF never generate varying SQLs (like it does for Contains over a parameterized list prior to 8.0).

Regard whether it's more or less complicated, I don't think that should be something the user cares much about... As long as the query works correctly and has good perf, the underlying mechanisms used to translate the LINQ query are implementation details.

For example, imagine that the user provides a list of prefixes and the query needs to return all orders for which the product name starts with any of the supplied prefixes. Contains will not help but it would be great to be able to write something like this.

There's no need for dynamic ORs here either; in fact, EF Core 8.0 supports this kind of query using the same OPENJSON technique used for Contains. The following LINQ query:

var prefixes = new[] { "foo", "bar" };
_ = await ctx.Blogs.Where(b => prefixes.Any(p => b.Name.StartsWith(p))).ToListAsync();

Translates in 8.0 as follows:

SELECT [b].[Id], [b].[Flag], [b].[Name]
FROM [Blogs] AS [b]
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON(@__prefixes_0) WITH ([value] nvarchar(max) '$') AS [p]
    WHERE [p].[value] IS NOT NULL AND LEFT([b].[Name], LEN([p].[value])) = [p].[value])

In other words, OPENJSON (or the equivalent in other databases) is used to unpack a JSON list into a table, at which point the regular relational tools can be used. This is extremely powerful/flexible and allows translating a wide array of LINQ constructs which weren't translatable prior to 8.0.

If EF provided a built-in utility such as PredicateBuilder, it would add a ton of flexibility to build arbitrary complex tree of AND/OR condition trees that can operate on multiple columns using the full power of EF.

If PredicateBuilder works and satisfies your needs, then why does it need to be a built-in utility in EF Core? We very much don't believe that all possible functionality needs to be in-the-box inside EF, and are always happy to see 3rd-party utilities integrate with EF. It's simply not practical to expect the EF team to do everything.

clement911 commented 1 year ago

Thanks @roji Assuming OPENJSON performs similarly to IN/OR then I think it's fantastic that EF is trying to avoid varying SQL and ensure query plans are cached and reused.

There's no need for dynamic ORs here either; in fact, EF Core 8.0 supports this kind of query using the same OPENJSON technique used for Contains.

That's really cool. So if I understand, you're saying that this way of formulating the query will work single column predicates but not with multi-column/tuple predicates? e.g.

Works:

long[] orderIds= ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.Id == id).ToListAsync();

Doesn't work:

(long tenantId, long orderId)[] orderIds = ...;
await ctx.Orders.Where(o => orderIds.Any(id => o.TenantId == id.tenantId && o.Id == id.orerId).ToListAsync();

Right?

Also I'm curious if using OPENJSON works properly when using various collations? We have a lot of columns with non-default collations. My understanding is that n(var)char parameters always have the default DB collation. If you compare to a column that has a different collation, sql server will reject the query unless a COLLATE clause is added.

I understand that EF cannot realistically implement and support every possible feature. I will review existing third party solutions to build OR predicates.

roji commented 1 year ago

Assuming OPENJSON performs similarly to IN/OR [...]

Take a look at this comment and at that issue in general. I haven't benchmarked dynamic ORs specifically, but I have compared with the previous Contains behavior (i.e. embed constants in the SQL). The bottom line is that if you concentrate solely on the specific query performance, we've seen the OPENJSON-based technique perform slightly worse than constant embedding; however, once you consider the overall perf impact on other queries (i.e. other query plans getting evicted because of varying SQLs), paying that relatively small constant perf hit should make sense.

So if I understand, you're saying that this way of formulating the query will work single column predicates but not with multi-column/tuple predicates?

Yes - arrays of tuples aren't yet supported (as opposed to arrays of e.g. ints). This is something I hope we'll be able to improve in the near future.

If you compare to a column that has a different collation, sql server will reject the query unless a COLLATE clause is added.

AFAIK that's not how it works - have you tested this? Parameters - just like nvarchar constants - do not have a collation (by default); when you compare them to a column, the column's collation is used for the comparison operation. Similarly, the OPENJSON-generated table has a column with no collation, so OPENJSON shouldn't make any difference compared to sending a simple string parameter. If you're seeing another behavior please open a new issue with some code so we can investigate.

roji commented 1 year ago

I think that the discussion here has run its course. To summarize my point of view:

Given the above I'll go ahead and close this issue, but of course feel free to post back here with any other thoughts and continue the conversation.

clement911 commented 1 year ago

@roji I appreciate your fast and clear comments. Your points all make sense.

However, I did some quick testing and I think that the collation thing will be an issue that needs to be addressed otherwise it'll break queries with a .Contains that operate on char columns with a non default collation.

roji commented 1 year ago

@clement911 can you post your quick testing so we can investigate?

clement911 commented 1 year ago

@roji Yes, I will. Do I have to install .net 8 preview and vs 2022 preview to use ef8?

Meligy commented 1 year ago

@clement911 you need the preview SDK. Then you can use VS Code to write the code or get the right VS version.

clement911 commented 1 year ago

I raised a bug issue here: https://github.com/dotnet/efcore/issues/32147

n0099 commented 3 months ago

For anyone who have to do https://github.com/dotnet/efcore/issues/32092#issuecomment-1770837841

Finally, if constructing dynamic ORs is really what you're looking for, nobody's stopping you - LINQ does allow you to dynamically construct expression trees as you you like. It may not be as pretty or easy as writing C# code, but there's nothing inherently wrong about it. It's also trivial to create an extension or 3rd-party tool which does this (like PredicateBuilder), so there really isn't any reason for EF to include it (not everything has to be part of the product).

and copy-pasted so many PredicateBuilders then want to reuse them, here's an extension method:

/// <see>https://github.com/dotnet/efcore/issues/32092#issuecomment-2221633692</see>
/// <see>https://stackoverflow.com/questions/70744232/query-where-multiple-columns-have-to-match-a-value-set-simultaneously-in-ef-core/78732959#78732959</see>
public static IQueryable<TEntity> WhereOrContainsValues<TEntity, TToCompare>(
    this IQueryable<TEntity> queryable,
    IEnumerable<TToCompare> valuesToCompare,
    IEnumerable<Func<TToCompare, Expression<Func<TEntity, bool>>>> comparatorExpressionFactories) =>
    queryable.Where(valuesToCompare.Aggregate(
        LinqKit.PredicateBuilder.New<TEntity>(),
        (outerPredicate, valueToCompare) => outerPredicate.Or(
            comparatorExpressionFactories.Aggregate(
                LinqKit.PredicateBuilder.New<TEntity>(),
                (innerPredicate, expressionFactory) =>
                    innerPredicate.And(expressionFactory(valueToCompare))))));

Taking the example from https://github.com/dotnet/efcore/issues/32092#issue-1951139044

public class Order
{
     [Key]
     public long TenantId {get;set;}

    [Key]
     public long Id {get;set;}

    //other properties...
}

(long tenantId, long orderId)[] orderIds = //array of (long, long) tuples of the composite ids of the orders that need to be fetched
var orders = await ctx.Orders.Where(o => orderIds.Contains((o.TenantId, o.Id)).ToArrayAsync(); //this doesn't work

it now would be:

var orders = await ctx.Orders.WhereOrContainsValues(orderIds,
[
    orderId => order => orderId.tenantId == order.TenantId,
    orderId => order => orderId.orderId == order.Id 
]).ToArrayAsync();

and translated to SQL like:

SELECT fields FROM Orders
WHERE (TenantId = $1 AND Id = $2)
   OR (TenantId = $3 AND Id = $4)
-- keep go on for each item in orderIds
n0099 commented 3 months ago

Also try out a more generic one: https://stackoverflow.com/questions/67934374/ef-core-5-contains-with-several-columns/67935339#67935339 with some modifications:

/// <see>https://stackoverflow.com/questions/67666649/lambda-linq-with-contains-criteria-for-multiple-keywords/67666993#67666993</see>
public static IQueryable<T> FilterByItems<T, TItem>(
    this IQueryable<T> query,
    IEnumerable<TItem> items,
    Expression<Func<T, TItem, bool>> filterPattern,
    bool isOr = true)
{
    var predicate = items.Aggregate<TItem, Expression?>(null, (current, item) =>
    {
        var itemExpr = Expression.Constant(item);
        var itemCondition = FilterByItemsExpressionReplacer
            .Replace(filterPattern.Body, filterPattern.Parameters[1], itemExpr);

        return current == null
            ? itemCondition
            : Expression.MakeBinary(isOr ? ExpressionType.OrElse : ExpressionType.AndAlso,
                current,
                itemCondition);
    }) ?? Expression.Constant(false);
    var filterLambda = Expression.Lambda<Func<T, bool>>(predicate, filterPattern.Parameters[0]);

    return query.Where(filterLambda);
}

private sealed class FilterByItemsExpressionReplacer(IDictionary<Expression, Expression> replaceMap) : ExpressionVisitor
{
    private readonly IDictionary<Expression, Expression> _replaceMap
        = replaceMap ?? throw new ArgumentNullException(nameof(replaceMap));

    public static Expression Replace(Expression expr, Expression toReplace, Expression toExpr) =>
        new FilterByItemsExpressionReplacer(new Dictionary<Expression, Expression> {{toReplace, toExpr}})
            .Visit(expr);

    [return: NotNullIfNotNull(nameof(node))]
    public override Expression? Visit(Expression? node) =>
        node != null && _replaceMap.TryGetValue(node, out var replacement)
            ? replacement
            : base.Visit(node);
}
var orders = await ctx.Orders.FilterByItems(orderIds, (ids, o) => ids.Contains((o.TenantId, o.Id)).ToArrayAsync();