dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

LINQ Usage Survey #76205

Closed AaronRobinsonMSFT closed 1 year ago

AaronRobinsonMSFT commented 1 year ago

The .NET team is trying to better understand the community's usage of LINQ in real world applications. Since LINQ was first introduced in .NET 3.5, there have been well-known performance issues—as evidenced by a simple web search. However, these performance issues are not all the same, since some applications put more weight on the expressiveness of LINQ relative to its performance. The community has also created solutions that create optimized code for certain LINQ expressions, for example LinqOptimizer.

The goal of this survey is simply to understand common LINQ usage patterns and the problems they solve. We are also keen to understand why people use LINQ. We are asking the community to help us focus our attention on where we can look to improve performance that matters to you.

Please comment on this issue answering the following questions. If there is already a comment that has an example that captures your scenario, "thumbs up" the comment instead. Try to limit one LINQ expression per comment (post multiple comments if you have multiple examples); this way the "thumbs up" mechanism is more effective for others to indicate their agreement. If you prefer feel free to reach out directly via email to @AaronRobinsonMSFT or @elinor-fung; our email addresses can be found in our respective Github profiles.

We will be following this survey issue for the next two weeks after which point it will be closed. Thank you.

Questions:

1) Do you primarily use LINQ using the Query syntax or the Method syntax?

1) Please share or link to a representative example of your use of LINQ. Include:

1) If you have intentionally avoided LINQ due to performance issues, please share an example of:

LeaFrock commented 1 year ago

We will be following this survey issue for the next two weeks after which point it will be closed.

Please, at least one month (or 4 weeks). I appreciate this survey and hope to read more others' share.

Do you primarily use LINQ using the Query syntax or the Method syntax?

I always use Method syntax as its style is the same as FP's. I don't like Query syntax because it looks very different from normal C# codes.

Please share or link to a representative example of your use of LINQ.

Recently I write a web API on ASP.NET Core 6.0. A part of the codes is similar to the following,


            protected static void MergeRepeatedStudentRowItems(List<PaperStudentRowItem> rows)
            {
                if (rows.Count < 1)
                {
                    return;
                }

                var repeatedRowGroups = rows
                       .GroupBy(r => r.StudentNo)
                       .Select(g => new { Item = g, Count = g.Count() })
                       .Where(p => p.Count > 1) // The expected rate of repeated rows is below 2‰
                       .Select(p => p.Item.AsEnumerable())
                       .ToArray();
                foreach (var group in repeatedRowGroups)
                {
                    MergeRepeatedRows(group, rows);
                }

                static void MergeRepeatedRows(IEnumerable<PaperStudentRowItem> repeatedRows, List<PaperStudentRowItem> source)
                {
                    var first = repeatedRows.First();
                    foreach (var row in repeatedRows.Skip(1))
                    {
                        // Skip some codes which combine the data of this row with `first`...
                        source.Remove(row);
                    }
                }
            }

I think no explanations are required about this method. I'm sure the method(or API) is not on a hot path, so I use LINQ. And I can image how much more work I need to do if the method is hot and the perf is required.

If you have intentionally avoided LINQ due to performance issues, please share an example

I always avoid LINQ if there's a native API, such as List<T>/Array with FindAll/Exists/Find instead of Where/Any/First.

I always avoid LINQ if low memory allocation or GC pressure is required.

I often avoid LINQ like,

var a = xxx.Select(p => new Person()); 
DoAction1(a);
DoAction2(a);

as it's possible to misuse(extra iteration and new-ops) which results in some kind of "foolish bug".

rellis-of-rhindleton commented 1 year ago

Method syntax, always. It’s intuitive, readable, and lends itself to custom extensions. Will never use the query syntax.

90% of our usage is the usual filter/refine/project, I.e. Where/Select/ToXxx, with the occasional GroupBy - though that tends to be more confusing (careful naming helps). We use it with regular IEnumerable and EF Core. Pretty much always .NET Core.

Have not run into performance issues. We usually try hard to make sure we’re working with reasonable and predictable collections though.

AaronRobinsonMSFT commented 1 year ago

@AaronRobinsonMSFT Please clarify whether you are referring only to queries and not to expression trees in general. I generate a lot of code using expression trees and no query is involved at all.

@raffaeler We are primarily interested in querying. Even though EF users consume expressions trees implicitly those scenarios are interesting too. However, explicit expression tree usage isn't the intent of this survey.

raffaeler commented 1 year ago

@raffaeler We are primarily interested in querying. Even though EF users consume expressions trees implicitly those scenarios are interesting too. However, explicit expression tree usage isn't the intent of this survey.

Thank you @AaronRobinsonMSFT, got it.

Part of the slowness derive from the need to use reflection inside theLinq Providers implementation. If the provider-side of the queries is interesting for your survey, please let us know.

Djoums commented 1 year ago

I mostly use .Net 6, and always method syntax for Linq. I try to avoid it except with simple non allocating queries, preferably out of loops/repeated calls.

foreach (var myObj in myList.OfType<IMyObject>()) No perf loss over manual code, quick and easy.

if (myEnumerable.Any()) or myEnumerable.FirstOrDefault() Can't do better manually without knowing the actual enumerable type.

foreach (var item in myCollection.Where(static item => item.IsActive).OrderBy(static item => item.Id)) Static lambdas are huge here, I don't want any avoidable allocations with Linq because it can quickly get out of hand (please make the compiler declare those lambdas static automatically).

From what I've seen, Linq works mostly with high level interfaces, which :

Finally, a short example of why I don't use Linq much. With .Net 6, Numbers is an array of 1 million integers :

public int Manual()
{
    var copy = new int[Numbers.Length];
    Array.Copy(Numbers, copy, Numbers.Length);
    Array.Sort(copy);
    return copy[^1];
}

public int Linq()
{
    var copy = Numbers.OrderBy(static n => n).ToArray();
    return copy[^1];
}

Benchmark it, the Linq version takes 4x the time of the manual version. And it's a really simple test.

hez2010 commented 1 year ago

1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Generally method syntax, but query syntax when it involves complex join and select-many expressions.

2. Please share or link to a representative example of your use of LINQ.

static IEnumerable<ScoreboardItem> GenScoreboardItems(Data[] data, Game game, IDictionary<int, Blood?[]> bloods)
{
    Dictionary<string, int> Ranks = new();
    return data.GroupBy(j => j.Instance.Participation)
        .Select(j =>
        {
            var challengeGroup = j.GroupBy(s => s.Instance.ChallengeId);

            return new ScoreboardItem
            {
                Id = j.Key.Team.Id,
                Name = j.Key.Team.Name,
                Avatar = j.Key.Team.AvatarUrl,
                Organization = j.Key.Organization,
                Rank = 0,
                LastSubmissionTime = j
                    .Where(s => s.Submission?.SubmitTimeUTC < game.EndTimeUTC)
                    .Select(s => s.Submission?.SubmitTimeUTC ?? DateTimeOffset.UtcNow)
                    .OrderBy(t => t).LastOrDefault(game.StartTimeUTC),
                SolvedCount = challengeGroup.Count(c => c.Any(
                    s => s.Submission?.Status == AnswerResult.Accepted
                    && s.Submission?.SubmitTimeUTC < game.EndTimeUTC)),
                Challenges = challengeGroup
                        .Select(c =>
                        {
                            var cid = c.Key;
                            var s = c.OrderBy(s => s.Submission?.SubmitTimeUTC ?? DateTimeOffset.UtcNow)
                                .FirstOrDefault(s => s.Submission?.Status == AnswerResult.Accepted);

                            SubmissionType status = SubmissionType.Normal;

                            if (s?.Submission is null)
                                status = SubmissionType.Unaccepted;
                            else if (bloods[cid][0] is not null && s.Submission.SubmitTimeUTC <= bloods[cid][0]!.SubmitTimeUTC)
                                status = SubmissionType.FirstBlood;
                            else if (bloods[cid][1] is not null && s.Submission.SubmitTimeUTC <= bloods[cid][1]!.SubmitTimeUTC)
                                status = SubmissionType.SecondBlood;
                            else if (bloods[cid][2] is not null && s.Submission.SubmitTimeUTC <= bloods[cid][2]!.SubmitTimeUTC)
                                status = SubmissionType.ThirdBlood;

                            return new ChallengeItem
                            {
                                Id = cid,
                                Type = status,
                                UserName = s?.Submission?.UserName,
                                SubmitTimeUTC = s?.Submission?.SubmitTimeUTC,
                                Score = s is null ? 0 : status switch
                                {
                                    SubmissionType.Unaccepted => 0,
                                    SubmissionType.FirstBlood => Convert.ToInt32(s.Instance.Challenge.CurrentScore * 1.05f),
                                    SubmissionType.SecondBlood => Convert.ToInt32(s.Instance.Challenge.CurrentScore * 1.03f),
                                    SubmissionType.ThirdBlood => Convert.ToInt32(s.Instance.Challenge.CurrentScore * 1.01f),
                                    SubmissionType.Normal => s.Instance.Challenge.CurrentScore,
                                    _ => throw new ArgumentException(nameof(status))
                                }
                            };
                        }).ToList()
            };
        }).OrderByDescending(j => j.Score).ThenBy(j => j.LastSubmissionTime)
        .Select((j, i) =>
        {
            j.Rank = i + 1;

            if (j.Organization is not null)
            {
                if (Ranks.TryGetValue(j.Organization, out int rank))
                {
                    j.OrganizationRank = rank + 1;
                    Ranks[j.Organization]++;
                }
                else
                {
                    j.OrganizationRank = 1;
                    Ranks[j.Organization] = 1;
                }
            }

            return j;
        }).ToArray();
}

For complete source, see https://github.com/GZTimeWalker/GZCTF/blob/main/GZCTF/Repositories/GameRepository.cs.

The above expression is used for generating a scoreboard for a CTF competition. Usually it gets executed several times a minute (we have a cache for the generated result so it won't be executed for many times in a short period). It runs on .NET 6+.

3. If you have intentionally avoided LINQ due to performance issues, please share an example.

We avoid using LINQ for short predicates because the predicate lambda cannot be inlined and also causes allocations. For example:

bool hasOddNumber = listOfNumbers.Any(i => i & 1 == 1);

Instead of using LINQ for above case, we would just write:

bool HasOddNumber(List<int> listOfNumbers)
{
    for (var i = 0; i < listOfNumbers.Count; i++)
        if (listOfNumbers[i] & 1 == 1)
            return true;
    return false;
}

bool hasOddNumber = HasOddNumber(listOfNumbers);

Code like above could be executed hundreds or even thousands of times a second.

Besides, we avoid using LINQ anywhere when it comes to game development (Unity) for the same reason.

We really hope that the allocation of the lambda (delegate) can be elided (by JIT using escape analysis), and the body of the lambda (delegate) can be both devirted and inlined (by JIT using PGO, which is already present in .NET 7), so that we can write LINQ everywhere to achieve less bloating and more readability without the concerns of performance issue.

louthy commented 1 year ago

1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Query syntax. Because it's the closest we can get to Haskell's do notation in C#. It is one of the single strongest feature of C#, it's a standout differentiator from other mainstream languages in my humble opinion; and it's criminal how little investment the C# team has put in to it over the years.

2. Please share or link to a representative example of your use of LINQ. Include:

I'll share a few things:

language-ext [netstandard 2.0]

language-ext open-source functional-programming library that I've been building for many years now. At the time of writing, around 5000 followers and the package has been downloaded 6.4 million times. It completely based around providing monads of various flavours that work with LINQ. The library probably wouldn't exist if it weren't for LINQ query-syntax.

language-ext effects examples [netstandard 2.0]

A sample project that uses the effect-monads from language-ext. This leverages the Aff monad - asynchronous IO monad that handles errors, resources, and dependency-injection automatically in the bind operation (SelectMany), and the Proxy monad which allows consumers, pipes, and producers to be composed together into a single Effect type - which can then be run as an isolated system.

In this case it streams chunks of data from a file (80 bytes at a time), those bytes are 'loaned' and will be cleaned up automatically, they get converted to a string, and then written to the console.

namespace EffectsExamples
{
    /// <summary>
    /// Text file chunk streaming example
    /// </summary>
    /// <remarks>
    /// Streams the contents of a text file in chunks of 40 characters
    /// </remarks>
    public class TextFileChunkStreamExample<RT> where RT : 
        struct, 
        HasCancel<RT>,
        HasConsole<RT>,
        HasFile<RT>,
        HasTextRead<RT>
    {
        public static Aff<RT, Unit> main =>
            from _ in Console<RT>.writeLine("Please type in a path to a text file and press enter")
            from p in Console<RT>.readLine
            from e in mainEffect(p)
            select unit;

        static Effect<RT, Unit> mainEffect(string path) =>
            File<RT>.openRead(path) 
               | Stream<RT>.read(80) 
               | decodeUtf8 
               | writeLine;

        static Pipe<RT, SeqLoan<byte>, string, Unit> decodeUtf8 =>
            from c in awaiting<SeqLoan<byte>>()         
            from _ in yield(Encoding.UTF8.GetString(c.ToReadOnlySpan()))
            select unit;

        static Consumer<RT, string, Unit> writeLine =>
            from l in awaiting<string>()
            from _ in Console<RT>.writeLine(l)
            select unit;
    }
}

echo-process actor model [netstandard 2.0]

FInally, a real-world actor-model actor. The code in processUserMessage would run each time an actor gets posted a message.

        /// <summary>
        /// Creates an effect that represents the user-inbox.
        /// </summary>
        /// <remarks>
        /// The effect is then forked so that it can run independently of the system-inbox.  This also allows it to be
        /// cancelled (when there's errors, or pause requests)
        /// </remarks>
        static Eff<RT, Unit> startUserInbox =>
            from u in getUserChannel
            from x in cancelInbox
            from c in fork(u | (channelPipe<UserPost>() | processUserMessage))
            from _ in putCancel(c)
            select unit;

        /// <summary>
        /// Consumer that awaits a UserPost and any other related context.
        /// If the UserPost message is of the correct type for this Process then it attempts to pass it to the inbox
        /// function.  Otherwise it passes it on to the dead-letters Process 
        /// </summary>
        static Consumer<RT, UserPost, Unit> processUserMessage =>
            from p in awaiting<UserPost>()
            from x in p.Message switch
                      {
                          A msg => runUserMessage(msg, p) | catchInbox(p),
                          _     => Process<RT>.forwardToDeadLetters(p)
                      }
            select unit;

        /// <summary>
        /// User space error handler
        /// </summary>
        static AffCatch<RT, Unit> catchInbox(UserPost post) =>
            @catch(e => from s in getSelf
                        from _ in Process<RT>.tellParentSystem(new ChildFaultedSysPost(e, post, s))
                        from x in cancel<RT>()
                        select unit);

        /// <summary>
        /// Processes a user-message
        /// </summary>
        /// <remarks>
        /// Passes the message to the user's inbox function.  If the resulting state is equal to what went it, nothing
        /// changes in the Actor.  Otherwise we update the state and publish it.
        ///
        /// If the inbox function fails then we tell the parent we're broken and cancel this forked inbox. 
        /// </remarks>
        static Aff<RT, Unit> runUserMessage(A msg, UserPost post) =>
            from ib in getUserInbox
            from os in getState | userSetup
            from ns in ib(os, msg)
            from _1 in putState(ns)
            from _2 in isStateEqual(os, ns) ? unitEff : publishState(ns)
            select unit;

3. If you have intentionally avoided LINQ due to performance issues, please share an example of:

I haven't, because I think that although there is a hit, it's overblown (in importance) for the vast majority of code. If I were writing something to-the-metal, then I'd probably avoid it, or at least prototype with it first and then replace piecemeal. The thing that LINQ brings (when used as C#'s monads support, rather than just C#'s enumerable/queryable support) - is robust and reliable code; something that's easy to get wrong with statements is hard to get wrong in LINQ. It is mine, and my team's go-to starting point for any new piece of code.

LINQ needs:

moonheart08 commented 1 year ago

To the above, the cost unfortunately isn't perceived and has showed up for us on production servers (and even has been a major contributor to game lag at times of heavy load, think 200-250 players). We pay a lot in alloc just to use IEnumerable with entity queries, though with most of our heavy-hitting LINQ cleaned out by now so LINQ isn't a performance concern for us anymore.

cpaquot commented 1 year ago

Query syntax !

I don't see much mention of it in the other answers, so I think it deserves its own comment to allow people to up vote 👍 it.

I'm a big fan of SQL, and I find query syntax really easy to read and write. I complement it with some method syntax when I have to, for instance .ToList, or to construct a IQueryable.

I use it mostly to query the database and I'm impressed by the quality of the generated SQL (I always pay attention to this point).

.NET6 and I never avoid it because of performance.

jburman commented 1 year ago

I generally use Method syntax when it's a simple case of chaining two or three calls together (e.g. OrderBy().Select()). Once it starts to grow beyond that, and particularly if there are any joins or groupings, then I switch to Query syntax as I find it more concise and readable. For more complicated scenarios (usually only EF related) I will often break the expression up into several smaller expressions and then join them back together again at the end.

I use it frequently across all version of .Net that I work with ranging from .Net Framework to .Net 6, and I generally only avoid it if I know the code will be in a very tight loop where time to complete or memory usage are a concern.

AaronRobinsonMSFT commented 1 year ago

Thank you!

The dotnet community has really delivered on this one. There are many great comments and insights into usage, all of your feedback is greatly appreciated. For now, we are going to close this issue and start aggregating the feedback. We will post the final aggregated results here when we have them and also provide some general take aways and what to expect. Again, thank you all for participating and helping to educate us on where our focus should be.