dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.13k stars 4.7k forks source link

[API Proposal]: `WhereIn` and `WhereInBy` LINQ operators #105743

Open alrz opened 2 months ago

alrz commented 2 months ago

Background and motivation

These operators have a direct sql translation and generally feels missing from the set of operators we have today.

The implementation is identical to Intersect and IntersetBy except with a Contains in place of Remove.

API Proposal

public static partial class Enumerable
{
+  static IEnumerable<TSource> WhereIn<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values);
+  static IEnumerable<TSource> WhereIn<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values, IEqualityComparer<TSource>? comparer);

+  static IEnumerable<TSource> WhereInBy<TSource, TKey>(this IEnumerable<TSource> source, IEnumerable<TKey> values, Func<TSource, TKey> keySelector);
+  static IEnumerable<TSource> WhereInBy<TSource, TKey>(this IEnumerable<TSource> source, IEnumerable<TKey> values, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

+  static IEnumerable<TSource> WhereNotIn<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values);
+  static IEnumerable<TSource> WhereNotIn<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values, IEqualityComparer<TSource>? comparer);

+  static IEnumerable<TSource> WhereNotInBy<TSource, TKey>(this IEnumerable<TSource> source, IEnumerable<TKey> values, Func<TSource, TKey> keySelector);
+  static IEnumerable<TSource> WhereNotInBy<TSource, TKey>(this IEnumerable<TSource> source, IEnumerable<TKey> values, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
}

public static class Queryable
{
+  static IQueryable<TSource> WhereIn<TSource>(this IQueryable<TSource> source, IEnumerable<TSource> values);
+  static IQueryable<TSource> WhereIn<TSource>(this IQueryable<TSource> source, IEnumerable<TSource> values, IEqualityComparer<TSource>? comparer);

+  static IQueryable<TSource> WhereInBy<TSource, TKey>(this IQueryable<TSource> source, IEnumerable<TKey> values, Expression<Func<TSource, TKey>> keySelector);
+  static IQueryable<TSource> WhereInBy<TSource, TKey>(this IQueryable<TSource> source, IEnumerable<TKey> values, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);

+  static IQueryable<TSource> WhereNotIn<TSource>(this IQueryable<TSource> source, IEnumerable<TSource> values);
+  static IQueryable<TSource> WhereNotIn<TSource>(this IQueryable<TSource> source, IEnumerable<TSource> values, IEqualityComparer<TSource>? comparer);

+  static IQueryable<TSource> WhereNotInBy<TSource, TKey>(this IQueryable<TSource> source, IEnumerable<TKey> values, Expression<Func<TSource, TKey>> keySelector);
+  static IQueryable<TSource> WhereNotInBy<TSource, TKey>(this IQueryable<TSource> source, IEnumerable<TKey> values, Expression<Func<TSource, TKey>> keySelector, IEqualityComparer<TKey>? comparer);
}

API Usage

var filtered = list.WhereIn(anotherList);

No response

Risks

No response

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.

eiriktsarpalis commented 2 months ago

The implementation is identical to Intersect and IntersetBy except with a Contains in place of Remove.

I'm not sure I follow this distinction. Looking at sql WHERE IN examples it seems to be equivalent to this particular IntersectBy overload-system-collections-generic-ienumerable((-1))-system-func((-0-1)))) so it isn't clear to me what the proposed methods are bringing to the table (other than naming parity perhaps).

cc @dotnet/efteam

alrz commented 2 months ago

so it isn't clear to me what the proposed methods are bringing to the table

Intersect and IntersectBy remove duplicates. Note the the distinction is more visible with the latter where you have a key selector, this will keep items that are not exactly "duplicates" but are determined to be the same by virtue of using a key selector.

eiriktsarpalis commented 2 months ago

I see, so you want to avoid set semantics. Does this extend to the other set-like operators such as Except or Union?

alrz commented 2 months ago

Does this extend to the other set-like operators such as Except or Union?

I don't think this applies to Except and Union. Except returns every item that is not present in the input and the equivalent of Union without set semantics would be Concat.

alrz commented 2 months ago

Actually Except would be equivalent to WhereNotIn so this fills the gap where you don't want Not there. I added a note to op.

alrz commented 2 months ago

By that reasoning the implementation for Except and ExceptBy is incorrect?

For Except using !set.Contains and set.Add have the same effect only for the first call, but not for subsequent ones.

private static IEnumerable<TSource> ExceptByIterator<TSource, TKey>(IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
    var set = new HashSet<TKey>(second, comparer);

    foreach (TSource element in first)
    {
        if (set.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

This actually remove "duplicates" but I don't think Except is supposed to do that.

For example, new[] {1,1,2}.Except(new[] {2}) should return [1, 1] but it returns [1].

Clockwork-Muse commented 2 months ago

This actually remove "duplicates" but I don't think Except is supposed to do that.

It is, that's the set semantics.

You can of course implement WhereIn via .Where(x => y.Contains(x)), or some form of Join with a result selector (which may perform better, and WhereNotIn as .Where(x => !y.Contains(x)).

alrz commented 2 months ago

It is

TIL :)

You can of course implement WhereIn

Yes, as long as you manually create the hashset (or if you just generating sql) this is fine, otherwise you're doing n^2. Of course this doesn't enable scenarios that are impossible and adding a helper in source to do that is trivial, this would only be a convenience API for readability.

eiriktsarpalis commented 2 months ago

It seems like a useful addition that we could consider in a future version. Would it be possible to update your original proposal such that the following have been added?

  1. Missing parameter names and namespaces,
  2. Matching WhereNotIn methods and
  3. Matching IQueryable overloads.

Thanks!