dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.84k stars 4.62k forks source link

LINQ Usage Survey #76205

Closed AaronRobinsonMSFT closed 1 year ago

AaronRobinsonMSFT commented 1 year ago

The .NET team is trying to better understand the community's usage of LINQ in real world applications. Since LINQ was first introduced in .NET 3.5, there have been well-known performance issues—as evidenced by a simple web search. However, these performance issues are not all the same, since some applications put more weight on the expressiveness of LINQ relative to its performance. The community has also created solutions that create optimized code for certain LINQ expressions, for example LinqOptimizer.

The goal of this survey is simply to understand common LINQ usage patterns and the problems they solve. We are also keen to understand why people use LINQ. We are asking the community to help us focus our attention on where we can look to improve performance that matters to you.

Please comment on this issue answering the following questions. If there is already a comment that has an example that captures your scenario, "thumbs up" the comment instead. Try to limit one LINQ expression per comment (post multiple comments if you have multiple examples); this way the "thumbs up" mechanism is more effective for others to indicate their agreement. If you prefer feel free to reach out directly via email to @AaronRobinsonMSFT or @elinor-fung; our email addresses can be found in our respective Github profiles.

We will be following this survey issue for the next two weeks after which point it will be closed. Thank you.

Questions:

1) Do you primarily use LINQ using the Query syntax or the Method syntax?

1) Please share or link to a representative example of your use of LINQ. Include:

1) If you have intentionally avoided LINQ due to performance issues, please share an example of:

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-meta See info in area-owners.md if you want to be subscribed.

Issue Details
The .NET team is trying to better understand the community's usage of LINQ in real world applications. Since LINQ was first introduced in .NET 3.5, there have been well-known performance issues—[as evidenced by a simple web search](https://www.bing.com/search?q=dotnet+linq+performance). However, these performance issues are not all the same, since some applications put more weight on the expressiveness of LINQ relative to its performance. The community has also created solutions that create optimized code for certain LINQ expressions, for example [LinqOptimizer](https://nessos.github.io/LinqOptimizer/). The goal of this survey is simply to understand common LINQ usage patterns and the problems they solve. We are also keen to understand why people use LINQ. We are asking the community to help us focus our attention on where we can look to improve performance that matters to you. Please comment on this issue answering the following questions. If there is already a comment that has an example that captures your scenario, "thumbs up" the comment instead. Try to limit one LINQ expression per comment (post multiple comments if you have multiple examples); this way the "thumbs up" mechanism is more effective for others to indicate their agreement. If you prefer feel free to reach out directly via email to @AaronRobinsonMSFT or @elinor-fung; our email addresses can be found in our respective Github profiles. We will be following this survey issue for the next two weeks after which point it will be closed. Thank you. ## Questions: 1) Do you primarily use LINQ using the [Query syntax](https://learn.microsoft.com/dotnet/csharp/linq/write-linq-queries#example---query-syntax) or the [Method syntax](https://learn.microsoft.com/dotnet/csharp/linq/write-linq-queries#example---method-syntax)? 1) Please share or link to a representative example of your use of LINQ. Include: * A short description of the goal of the expression * How often it is executed * Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+) 1) If you have intentionally avoided LINQ due to performance issues, please share an example of: * The problematic LINQ expression (if available) and the code that replaced it * A short description of the goal * How often it is executed
Author: AaronRobinsonMSFT
Assignees: -
Labels: `area-Meta`
Milestone: 8.0.0
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.

Issue Details
The .NET team is trying to better understand the community's usage of LINQ in real world applications. Since LINQ was first introduced in .NET 3.5, there have been well-known performance issues—[as evidenced by a simple web search](https://www.bing.com/search?q=dotnet+linq+performance). However, these performance issues are not all the same, since some applications put more weight on the expressiveness of LINQ relative to its performance. The community has also created solutions that create optimized code for certain LINQ expressions, for example [LinqOptimizer](https://nessos.github.io/LinqOptimizer/). The goal of this survey is simply to understand common LINQ usage patterns and the problems they solve. We are also keen to understand why people use LINQ. We are asking the community to help us focus our attention on where we can look to improve performance that matters to you. Please comment on this issue answering the following questions. If there is already a comment that has an example that captures your scenario, "thumbs up" the comment instead. Try to limit one LINQ expression per comment (post multiple comments if you have multiple examples); this way the "thumbs up" mechanism is more effective for others to indicate their agreement. If you prefer feel free to reach out directly via email to @AaronRobinsonMSFT or @elinor-fung; our email addresses can be found in our respective Github profiles. We will be following this survey issue for the next two weeks after which point it will be closed. Thank you. ## Questions: 1) Do you primarily use LINQ using the [Query syntax](https://learn.microsoft.com/dotnet/csharp/linq/write-linq-queries#example---query-syntax) or the [Method syntax](https://learn.microsoft.com/dotnet/csharp/linq/write-linq-queries#example---method-syntax)? 1) Please share or link to a representative example of your use of LINQ. Include: * A short description of the goal of the expression * How often it is executed * Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+) 1) If you have intentionally avoided LINQ due to performance issues, please share an example of: * The problematic LINQ expression (if available) and the code that replaced it * A short description of the goal * How often it is executed
Author: AaronRobinsonMSFT
Assignees: -
Labels: `area-System.Linq`
Milestone: 8.0.0
gpshonik commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Almost always Method syntax except when:

Please share or link to a representative example of your use of LINQ. Include:

If you have intentionally avoided LINQ due to performance issues, please share an example of:

Tornhoof commented 1 year ago

Applies to .NET 6

  1. Method syntax.

    • Mostly the typical . Where (...).Select to filter lists and select specific properties, e.g. IDs.
    • How often executed? Per request. I try to avoid tight loops with linq for perf reasons.
    • I use OrderBy fairly often as it is stable sort.
  2. Yes, if the profiler tells me to.

    • Replaced with for-loop, often together with CollectionsMarshal.AsSpan<T>
    • added AsList() method instead of ToList in some places ( checks is List<T> list first and returns that, instead of always doing ToList
    • Added a custom SingleValueList with custom struct enumerator for special casing single values.
    • Added custom IdList to replace foo.Select(a=>a.Id) in hot paths. (Yes literally a custom list with struct enumerator returning the Id property)

I try to avoid complex linq Statements, makes debugging harder (until I found out that you can set the breakpoint inside a multi statement linq Expression). I try to avoid the query syntax, too complicated (too close to SQL but not quite there)

KieranDevvs commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Method syntax 100% of the time. Most people I have found using the Query Syntax, do so to do Left joins in EF but this is still possible using the Method Syntax by configuring your data model in EF correctly.

Please share or link to a representative example of your use of LINQ:

Simply using .Where(x => ...), Any(x => ...), Single/SingleOrDefault/First/FirstOrDefault With predicates to filter data collections.

i.e

var subset = collection.Where(x => x.CriteriaBool || x.Property.Length > 5).ToArray();

I also find myself nesting LINQ queries in filters but this isn't as often i.e:

var subset = collection.Where(x => x.SubCollection.Any(y => y.Condition)).ToArray();

How often it is executed:

Extremely often, every code base i've ever worked in has several hundred instances of collections being filtered. Some of which may be in hot paths or be in loops that are thousands of iterations long.

Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+):

.NET 6

If you have intentionally avoided LINQ due to performance issues, please share an example of:

I only find myself replacing LINQ when the code is time sensitive and it is run under conditions that require the code to loop many times (where the overhead builds up to a noticeable amount).

A recent example: Working on a client server application, the server design goals was to support as many user connections as possible. The architecture of the server was event based meaning every time an event happened it was broadcast to the relevant clients. This meant iterating the collection of users on each event to find the relevant users.

SteveGilham commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Method syntax; I can probably count on the fingers of one hand the number of cases of query syntax, despite having used LINQ as soon as it became available.

Please share or link to a representative example of your use of LINQ.

This is a recent example

      return self.CustomAttributes.Any(a => a.AttributeType.FullName == "Microsoft.FSharp.Core.CompilationMappingAttribute" &&
                                            a.ConstructorArguments.Count == 1 &&
                                            (0x1f & (int)a.ConstructorArguments[0].Value) == 7); // Module type

This is used in static analysis, to determine if an item has attributes indicating that it represents an F# module.

It will be executed about once per type in the assemblies being analysed, for each rule for which it is germane - estimated order 10⁴ to 10⁵ per analysis.

Built to netstandard 2.0, it is intended for net472 and up, and netcoreapp21 and later.

Performance issues have never shown LINQ in the hot-spots; file system, network, and database access continue to dominate the wall-clock time.

ltrzesniewski commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

I most often use the method syntax on IEnumerable<T>, since it combines well with custom extension methods.

I tend to use the query syntax for SQL queries, where custom methods can't be translated to SQL anyway. I feel it reduces the risk of error.

The query syntax reads well and is useful for let and join, but I avoid it if I would need to mix it with method syntax.

Please share or link to a representative example of your use of LINQ.

Here's an example where I'd mix query syntax and method syntax to get "left outer join" semantics in SQL:

var data = from itemA in context.TableA
           from itemB in context.TableB.Where(b => b.ForeignKey == itemA.PrimaryKey).DefaultIfEmpty()
           ...

This is much easier to grasp than the "official" way:

var data = from itemA in context.TableA
           join itemBTemp in context.TableB on itemA.PrimaryKey equals itemBTemp.ForeignKey into tb
           from itemB in tb.DefaultIfEmpty()
           ...

...which also calls the method .DefaultIfEmpty() BTW.

If you have intentionally avoided LINQ due to performance issue

In a subset of our projects, an allocation is treated as a bug, because we don't tolerate GC pauses. Zero allocations means zero LINQ, unfortunately. These projects process a very large quantity of data over a long period of time, so we can't just use GC.TryStartNoGCRegion to solve the problem.

I've also avoided it when trying to optimize code which handles a large quantity of data in non-performance critical projects, but these are edge cases. In most of our regular code, LINQ is used a lot, and very much appreciated. 🙂

swythan commented 1 year ago

Just to make it easier for ppl to 👍🏻 this bit:

Method syntax

swythan commented 1 year ago

I use LINQ a lot, but only really on in-memory data. I don’t personally have much experience using it for databases etc.

Mostly simple select, filter, group, sort, etc.

Edit: (trying to answer your questions a bit more…)

My team’s stuff is mostly on .NET 5 right now, but we literally just started migrating everything to .NET 6 this week.

Expressiveness is generally more important to my team than performance. Most of our code isn’t particularly perf sensitive, and LINQ itself hasn’t seemed to come up much in anything we have profiled.

swythan commented 1 year ago

The more complex “LINQ” queries I end up writing are usually in Rx. There I’m often combining multiple streams of data, doing GroupByUntil and all sorts of stuff.

Some love from the dotnet team for Rx (e.g. getting the latest code released; treating it like a 1st class citizen in samples/docs) would be awesome.

swythan commented 1 year ago

I did do some work once (back on .NET Framework) that required some gnarly bit packed and tiled data structures that needed a lot of thought for me to get the code right.

For that I wrote all my code and tests using LINQ on “normally structured” data first, then did more LINQ for the bit packing, then finally (when a profiler said so) rewrote the LINQ with for loops, etc for performance (they were very tight loops over a lot of data).

The expressiveness of LINQ was a godsend for getting the code right the first time. I wonder how much faster it would have been with the current runtime, compiler & 3rd party libs.

thomaslevesque commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Usually the method syntax, but I occasionally switch to query syntax when it's more intuitive (e.g. when doing joins, SelectMany where I also need to access the "source" item, or when I need to use let to introduce a variable). Basically, I switch to query syntax when the method syntax becomes clunky.

  1. Please share or link to a representative example of your use of LINQ. Include:

Not sure what would be "representative", since my usage of Linq is pretty diverse. Instead I'll give examples of where I'd use query syntax rather than method syntax.

from item in items
from child in item.Children
let foo = CalculateSomething(item, child)
where SomeCondition(item, child, foo)
select SomeProjection(item, child, foo);

The method syntax is really ugly for this case:

items.SelectMany(item => item.Children, (item, child) => new { item, child, foo = CalculateSomething(item, child) })
    .Where(x => SomeCondition(x.item, x.child, x.foo))
    .Select(x => SomeProjection(x.item, x.child, x.foo));

(code written without an IDE, so there might be a few errors...)

  1. If you have intentionally avoided LINQ due to performance issues, please share an example of:

Almost never. Unless it's critical to save every cycle you can, Linq usually does a good enough job, as long as you keep in mind the algorithmic complexity of what you're doing (e.g. it's easy to end up with a O(n²) algorithm or worse using Linq if you don't pay attention).

One case I can think of is if you need to compute several statistics on a list of items (e.g. min, max, and sum). Using the built-in Linq methods would be inefficient, because it would need to enumerate the list 3 times. In this case it's better to do it manually with a loop, so that you can compute all 3 statistics in one go. (it would be possible to use Aggregate for this, but a bit clunky)

(EDIT: I'm talking about Linq to Objects here... this doesn't apply to EF of course)

ackava commented 1 year ago

Method syntax, for complicated joins etc I prefer plain FormattedString SQL

Abrynos commented 1 year ago

1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Method syntax

2. Please share a representative example of your use of LINQ.

Most iterations over any collection (that do not have any side effects). I very rarely use the IEnumerable<T> interface at all because most of the time results are either used... ... exactly once by directly iterating over them with a foreach (having side effects on some state) or ... more than once (in which case I use some ToList()/ToHashSet()/etc. for obvious reasons)

I do not use LINQ for database queries.

My LINQ queries are executed all the time, since they are everywhere in my code.

Always the latest version of dotnet for private projects. Something between net48 and net6.0 for work depending on the project.

moonheart08 commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

    Method syntax. Query syntax is generally not that intuitive (imo) in in-memory contexts and only really makes sense for databases. Plus we avoid LINQ in general due to allocation and performance constraints.

  2. Please share or link to a representative example of your use of LINQ. We are currently on .NET 6. Our usage is fairly lightweight due to most contexts being performance sensitive (it's a game, only 16.6ms per frame or less and 33.3ms per tick, respectively, and frames/ticks share time.), but we do use it more in-depth in non-performance-sensitive contexts like command frontends: https://github.com/space-wizards/space-station-14/blob/414f32a4eecf3aeac7a0d8fb7b9e4d837b15ca2b/Content.Server/Administration/Commands/WarpCommand.cs#L59-L62 We also use ToList/ToHashSet/ToDictionary/ToArray a decent bit, as copying a collection can be a fairly useful operation, but that's not as much of a concern and is fairly clear about the fact it allocates (how could it not?)

  3. If you have intentionally avoided LINQ due to performance issues, please share an example. https://github.com/space-wizards/space-station-14/pull/2011 is a fairly old PR now, but was a case where even .Where(...).Select(...) turned out to be too much of a heavy weight for allocations. If those in particular could be made faster that'd be great for our QoL.

panuoksala commented 1 year ago

1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Only method syntax. I find the query syntax too long and bit hard to read, because it is so far away from normal SQL structure (from =>where => select vs. select => from => where).

2. Please share or link to a representative example of your use of LINQ. Include:

.NET Framework 4.8 and EF 6.4. 95% of LINQ syntax are written to find things from collections. 90% of the queries ends with .ToList() (to load things from db immediately).

var storageFeeInvoices = context.Invoices.Join(context.StorageFees, invoice => invoice.Id, fee => fee.Invoice.Id, (invoice, fee) => new { StorageFeeInvoice = invoice, StorageFee = fee })
                                          .Where(x => x.StorageFee.Product.Id == productId)
                                          .Select(x => x.StorageFeeInvoice).ToList()

We also have extension methods that simplifies the join structure, but I took one without it as an example how hard the join syntax currently can be.

In some cases we are also building LINQ queries from multiple parts. We have methods that returns Expression<Func<T, bool>> and IQueryable expressions and they are combined into HUGE LINQ query, that can be like 200-300 lines long. The reason for this is that project has been maintained for many years and all the new features are just added into existing queries. Long LINQ queries tends to be a really hard to refactor, because they are doing so many things. Their readability is also very low and they are hard to understand.

  1. If you have intentionally avoided LINQ due to performance issues, please share an example of:

With EF that is not daily, but I would say at least weekly job. However it is not LINQ problem it is an EF problem. For regular business applications the performance of the LINQ is in good state and doesn't cause problems anymore.

moonheart08 commented 1 year ago

In addition to my above comment, a pattern within our codebase is to forgo IEnumerable entirely in favor of a domain-specific struct with a Next(), as this is typically alloc-less in the contexts we use it in as we can simply provide it existing internal structures instead of allocating something new (notably, physics code, where allocating at all is generally bad) https://github.com/space-wizards/RobustToolbox/blob/1eb7393a60db8b61e790092dd04c24d6b4f2e694/Robust.Server/GameStates/ChunkIndicesEnumerator.cs#L6-L43 However, as you can probably tell, this means no linq or foreach.

mika76 commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

I use both, but I prefer the query syntax (maybe because I like sql) but also I use let quite a bit and it all looks nice and neat. I would love the query syntax to be expanded and things like extension methods to be built into it and things like ToList - would be so nice to be able to say ...select_list or something instead of saying (from .. in .. select).ToList() or maybe even have a way to expand the query syntax ourselves with keywords (create our own select keyword for eg)

Please share or link to a representative example of your use of LINQ. Include:

Tough one 😄 Here's a snippet where I read in fields used in some crystal reports from an old project:

usedFields.AddRange(from b in r.Bands
from c in b.Controls.Cast<XRControl>()
from binding in c.DataBindings.Cast<XRBinding>()
where binding.DataSource is SqlDataSource
let s = binding.DataMember.Split('.')
select Tuple.Create<string, string>(s[0], s[1]));

Another one:

info.Tables = (from t in usedFields
orderby t.Item1
group t by t.Item1 into table
select new Table()
{
Name = table.Key,
Fields = (from field in table
orderby field.Item2
select new Field()
{
Name = field.Item2,
Type = ""
})
.ToList()
})
.ToList();

These are used in a process where I read in a bunch of crystal reports and generate xml files with their db usage - this way when we search our codebase to see where something uses a field or table it is indexed and reported on. We also use these xml files to import the list of reports into our reporting db with metadata. It is pretty much run on commit of a report and not in a loop.

Here's one where I read in an xml file for processing:

var loaded = XDocument.Load(this.Host.ResolvePath("PacketData.xml"));

return new ObjectModel() {
        Types = from t in loaded.Root.XPathSelectElements("./types/type")
            let typeItems = t.XPathSelectElements("./*")
            select new ByteType()
                {
                    Name = t.Attribute("name").Value,
                    DataType = t.Attribute("dataType").Value,
                    Flags = t.Attribute("flags") != null && t.Attribute("flags").Value == "true" ? true : false,
                    Items = from item in typeItems
                        let descAtt = item.Attribute("description")
                        select new ByteTypeItem()
                            {
                                Name = item.Name.LocalName,
                                Description = descAtt != null ? descAtt.Value : item.Name.LocalName,
                                Code = item.Attribute("code").Value
                                }
                    },
        Categories = from categoryXml in loaded.Root.XPathSelectElements("./category")
            select new Category()
                {
                    Name = categoryXml.Attribute("name").Value,
                    RequireProcess = bool.Parse(categoryXml.Attribute("requireProcess").Value),
                    Packets = from packetXml in categoryXml.XPathSelectElements("./packet")
                        let descElement = packetXml.Element("description")
                        let attPacketType = packetXml.Attribute("packetType")
                        select new Packet()
                            {
                                Name = packetXml.Attribute("name").Value,
                                Code = packetXml.Attribute("code").Value,
                                IsBootloader = attPacketType != null && attPacketType.Value == "bootloader",
                                Description = descElement != null ? descElement.Value : "", //Att(packetXml, "description", null),
                                Properties = from property in packetXml.XPathSelectElements("./properties/*")
                                    select GetProperty(property)
                                }
                    }
        };

(I have a few of these some getting pretty large and complex and reading and processing the whole file in a single linq query using extension methods where necessary) This query is used in a t4 template to generate parsing classes for our internal byte packets. It gets much more complicated but is also only called on changing the xml file and not in a loop.

Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+)

Most of these are still in .net 4.5 but are being slowly converted over to .net 6 - the t4 one has already been converted to use source generators instead. But the linq stuff has not really changed much. Maybe using switch expressions more and names tuples.

If you have intentionally avoided LINQ due to performance issues, please share an example of:

I try keep linq out of hot paths or I cache in lists before the loop so I don't think I have any examples. I definitely do use simple linq expressions in loops for some simple stuff like finding Min or Max or Summing and stuff like that. Oh and ordering and grouping I use quite a bit - although try cache that stuff if possible. I hardly ever use joins in linq at all but do use selectMany.

Oh and I'm not sure if it's relevant but I don't ever use Entity Framework - just never got into it. And also I use linq2xml a lot too.

joukevandermaas commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Method syntax. The query syntax is not really better (e.g. more readable) and different enough from actual SQL that it basically has no benefit.

  1. Please share or link to a representative example of your use of LINQ. Include:

We use a lot of LINQ to perform database queries, like this:

var surveyEntity = _dbContext
    .Surveys
    .Where(s => s.SurveyId == surveyId)
    .Select(s => new { s.QuotaFrame })
    .SingleOrDefault();

These are executed for every request, mostly on .net6 but the same code also runs on .net 4.8 in some cases (we use netstandard).

Another common case is to transform one less convenient data structure into a more convenient one; e.g. using Aggregate, ToDictionary, etc. It is harder to find a clear example of this but it's fairly common.

  1. If you have intentionally avoided LINQ due to performance issues

Hard to find a clear example of this, but this basically always applies to LINQ to objects (not EF). Most of the time we replace it with a basic for loop and using arrays from the shared array pool to avoid allocations. In almost all cases I've seen the allocations done by LINQ are a bigger issue than the actual "slowness" of executing the query. Especially when doing LINQ stuff inside of loops, the allocations are extremely problematic. When starting optimization work, stuff allocated by LINQ is almost always at the top of the list in the profiler.

jods4 commented 1 year ago

LINQ is everywhere in our projects, both LINQ to Objects, which I feel you're most interested in for the perf survey, and IQueryable flavor (99% of the time translated to SQL).

Main reason is that LINQ to Objects is super convenient and expressive to manipulate lists.

  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Both, probably 50/50.

We tend to default to Query syntax, and mostly fallback to method in the following two cases:

var matches = list .Where(x => x.Priority == Priority.Urgent) .OrderBy(x => x.DueDate) .ToList();

- When methods that don't have LINQ equivalent are involved, since switching into/out of LINQ is clumsy in C#. 
```csharp 
var query = source
  .DistinctBy(x => x.ProductId) // not LINQ-able
  .Select(x => x.Name)
  .ToList() // not LINQ-able

👉 I would love a LINQ operator to pipe into functions (I believe this is a long-time C# request), hypothetically:

var query = from x in source
            apply DistinctBy(x.ProductId)  // implicit Func<T> ?
            select x.Name
            pipe ToList(); // chaining

On the other hand, we never use Method syntax when there are joins or let expressions involved because they are much more convoluted than the LINQ equivalent.

  1. Please share or link to a representative example of your use of LINQ. Include:

LINQ to Objects would typically be simple examples such as the ones above in question 1., they're everywhere. Unlike previous example, our real LINQ usage very commonly closes on local variables. Sometimes we do more involved stuff like nesting:

var recent = documents.Where(doc => doc.Changes.Any(c => c.Date == yesterday));
foreach (var doc in recent)
{ /* do stuff */ }

IQueryable we go absolutely crazy with super complex, pages-long queries, sometimes composed over several many method calls. We translate C# into SQL with linq2db, which is an amazing lib. 👉 it's sometimes frustrating that newer C# syntax is unsupported in Expression, would love to see some updates on that front.

Nowadays we're lucky enough to run all our projects on .net 6 😃

  1. If you have intentionally avoided LINQ due to performance issues, please share an example of.

We usually don't avoid LINQ. When you do DB-intensive work you can take the hit of extra allocations or CPU. It's not ideal but readability and convenience wins (e.g. over writing for loops instead). Of course, we would still love LINQ queries that execute with for-loop speed! 😉

mdawood1991 commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Please share or link to a representative example of your use of LINQ. Include:

Example, to get the latest record for each Asset in our System would be:

var query = CompanyContext.Assets
        .Where(a => assetIds.Contains(a.iAssetId))
        .Select(r => r.VehicleMonitoringLogs
                        .OrderByDescending(t => t.dtUTCDateTime)
                        .Take(1)
                        .Select(log => new LocationLogDTO
                        {
                            AssetId = log.iAssetId,
                            VehicleMonitoringId = log.iVehicleMonitoringId,
                            ....
                        })
                        .ToArray()
                        .FirstOrDefault() ?? new LocationLogDTO() { AssetId = r.iAssetId }
                );

The problematic LINQ expression (if available) and the code that replaced it

olmobrutall commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Query Syntax when let, join or group by are involved. Otherwise Method syntax.

Please share or link to a representative example of your use of LINQ. Include:

I think performance-wise, there is not too much to worry in LNQ-to-SQL. The Provider and the DBMS dominate: image

This diagram shows a async API query that took 71ms (in blue), of which 1.6ms are the LINQ provider expression manipulation (pink) and 56ms is SQL Server time (yellow). Removing validations when building expression trees (like https://github.com/dadhi/FastExpressionCompiler) could make the pink part smaller but not sure if it's worth.

On the convenience side, however, I see two important improvements:

The first and more important is allowing ?. in expression trees. Already discussed here.

For the second one, here are examples of a quite compliated LINQ-to-SQL that Signum.Framework uses to get all the information of the database schema in SQL Server and PostgreeSQL.

The method calls inside these queries work because of a custom AutoExpressionFieldAttribute, that allows to implement Properties/Methods/Extension Methods as an expression tree, so the implementation can be understood and expanded inside LINQ-to-SQL queries. Example here.

A typical Signum Framework application uses AutoExpressionField hundreds of times, it is a very useful feature for representing busines back references/calculated values/complex business logic concepts that can be translated to the database.

Currently, it works with using some Mono.Cecil black magic that is able to move and convert the body of the method to a static field above the method, so the LINQ provider can consume it while translating the query, but would be more convenient if instead of writing:

    [AutoExpressionField]
    public IQueryable<SysColumns> Columns() => 
        As.Expression(() => Database.View<SysColumns>().Where(c => c.object_id == this.object_id));

One could write

    public expression IQueryable<SysColumns> Columns() => 
        Database.View<SysColumns>().Where(c => c.object_id == this.object_id);

A short description of the goal of the expression

How often it is executed

LINQ-to-SQL... when user requests data, typically not in hot paths. LINQ-to-Objects for manipulating data, parsing files with regexes or XML, etc... Maybe in medium-hot paths.

Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+)

.Net 6

If you have intentionally avoided LINQ due to performance issues, please share an example of:

Most of the time I don't avoid it. One time when I recall was not for CPU performance but memory consumption. This solved the problem: https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xnode.readfrom?redirectedfrom=MSDN&view=net-6.0#System_Xml_Linq_XNode_ReadFrom_System_Xml_XmlReader_

The problematic LINQ expression (if available) and the code that replaced it

A normal LINQ-to-XML query but opening a huge file.

A short description of the goal

Loading a big XML File

How often it is executed

Not often.

Finally, I think one big win of making at least a subset of LINQ-to-Object a zero-cost abstraction is that it could be used by you in Roslyn, and then maybe receive more love because of dogfooding.

LINQ, and specially LINQ to SQL / Expression Tree is for me the main competitive advantage of C# over any other (non-LISP) language.

dmcweeney commented 1 year ago

Method syntax

As well as the above mentioned cases mentioned above we use it quite a bit for building Cosmos Db queries - from C#/.Net 5 code. It is much simpler to build and maintain the queries using Linq than building sql queries as strings etc...

See https://learn.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-linq-to-sql

mwadams commented 1 year ago

We primarily (almost exclusively) use method syntax.

When writing performance-sensitive code, the chief barrier to use is understanding what the allocation profile is going to be for the expression. Something that would make a massive difference (not just to LINQ) is some kind of Roslyn analyzer that could tell me when I'm going to be getting closure capture, hidden allocations etc.

sfiruch commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

100% method syntax.

Please share or link to a representative example of your use of LINQ.

https://github.com/deiruch/SATInterface/blob/master/SATInterface/LinExpr.cs#L594:

var vars = _a.Weights.OrderByDescending(w => Math.Abs(w.Value)).Select(w => w.Value > 0 ? w.Key : !w.Key).ToArray();

This is part of a mathematical optimization library, which runs on .NET 6+.. It produces variables in a binary linear expression by weight and converts the expression to one with non-negative weights (but inverted boolean variables instead). This may be executed ten thousand of times. Over time I replace more and more uses with traditional imperative code for performance reasons.

My LINQ use is 95% LINQ-to-Objects, 5% Entity Framework.

If you have intentionally avoided LINQ due to performance issues, please share an example

Original code from a ray tracer:

            return Things.Select(t => t.FindClosestHitPoint(_rOrigin, _rDirection, _maxLambda))
                .OrderBy(hp => hp.Distance)
                .FirstOrDefault(HitPoint.NoHit);

Replaced it with this (slightly different semantics):

            var closestDistance = _maxLambda;
            var hpClosest = HitPoint.NoHit;
            foreach (var t in Things)
            {
                var hpCur = t.FindClosestHitPoint(_rOrigin, _rDirection, closestDistance);
                if (hpCur.Distance < closestDistance)
                {
                    hpClosest = hpCur;
                    closestDistance = hpCur.Distance;
                }
            }
            return hpClosest;

This is looking for the first intersection of a ray with geometric objects. It is executed billions of times.

mrange commented 1 year ago
  1. Method syntax only
  2. Pretty typical on our app:
    var supplierInvoiceLineId = 
    existingSupplierInvoice
    ?.SupplierInvoiceLines
    ?.Items
    ?.SelectMany(s => s.ExternalSystemSupplierInvoiceLines.Items)
    .FirstOrDefault(f => f.ExternalId == invoiceLine.SupplierInvoiceLineIdExternal)
    ?.SupplierInvoiceLineId
    ;

Finds the first supplerInvoiceLine for the external systems row. It's used in integrations between systems.

This particular one I think is executed at most once a minute in our system.

For the more intense web APIs I suppose similar LINQ queries executes a few 100 times a second per plan. We are not a very high traffic site.

Nothing very advanced as you can see.

  1. Nothing in my current work requires more than average performance and LINQ does a good job delivering average performance.

I did write an optimized Mandelbrot a few years ago and my first attempt was a parallel LINQ version which did quite poorly and the final version is just too long and complex to put here :) (https://gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c#applying-our-learnings-to-f)

PS. I am quite interested in fast data pipelines and I think a push pipeline using F# [<InlineIfLambda>] is the best compromise between simple implementation and good performance. [<InlineIfLambda>] makes all abstractions disappear from the final IL code (In case you are interested: https://github.com/mrange/PushStream6#results)

olmobrutall commented 1 year ago

I know the main focus is LINQ-to-Objects, but one thing that could massively improve performance in LINQ-to-SQL is having a better indication of which lambdas are expression trees and which one are delegates.

Imagine the following code:

db.Customers.Where(a => a.Country == "USA").ToDictionary(a=>a.CustomerId, a=>a.WebSiteUrl); 

If the developers is not aware that ToDictionary works on IEnumerable<T>, if will think that whe SQL is something like

SELECT c.CustomerId, c.WebSiteUrl
FROM Customer c
WHERE c.Country == 'USA'

When in reality is something like:

SELECT c.CustomerId, c.CustomerName, c.Addresss_ZIPCode, c.Address_Street, 
c.Address_State, c.Telephone1, c.Telephone2, c.Comment, cat.Name, --- Many more columns
FROM Customer c
--- Maybe eager joins with many tables to retrieve related entites or collections
WHERE c.Country == 'USA'

If the IDE would be able to show all the lambdas that get converted to a Expression<T> with a different background (a different color category in Visual Studio -> Settings -> Fonts and Colors):

image

Or even all the method chains/ query expression that return a IQueryable<T>

image

I think the internal workings of LINQ-to-SQL will be easier to understand and the developer will fall in the pit of success, writing:

db.Customers
.Where(a => a.Country == "USA")
.Select(a =>new { a.CustomerId, a.WebSiteUrl })
.ToDictionary(a=>a.CustomerId, a=>a.WebSiteUrl); 

In my experience, an optimization in the DB pays 1000x more than one in the CPU.

hopperpl commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

100% Method

I think Query is so hard to read among C# code and disturbs code flow.


  1. Please share or link to a representative example of your use of LINQ.

Mostly with set theory. Have a set of elements, have unions, intersections, all the other "venn"s. Data is "live" data mostly, not backed by databases as there is no data storage. Points, Pixels, Colors, Ranges... such data

Another usage for me is to have one-line text file parsers or writers. Simple tabulary text files with a | delimiter,

   var Content = File.ReadAllLines(ContentFile.FullName);
   var List    = Content.Select(e => e.Split('|')).Select(e => new AssetType(e[0].Trim(), ulong.Parse(e[1], NumberStyles.HexNumber), uint.Parse(e[2]), ulong.Parse(e[3], NumberStyles.HexNumber), e[4].Trim(), e[5].Trim()));
   ...
   var List = AssetTypes.Values.OrderBy(e => e.Type).ThenBy(e => e.TypeSub).Select(l => $"{l.Pack,-16} | {l.Hash:X16} | {l.Index,-5} | {l.Type:X16} | {l.TypeSub,-75} | {l.Name}").ToList();
   File.WriteAllLines(ContentFile.FullName, List);

I like the simple one-line usage cases with LINQ. Also allows it to be used for field initializers.

Another heavy use set for me is caching by "indexing" data quicky with Data.Where(...).Order(...).ToDictionary().

Or quick statistics.

  var Statistics = AssetTypes.Values.SelectMany(e => e.Values).GroupBy(e => e.Type).OrderByDescending(e => e.Count()).Select(e => $"{e.Key:X16} => {e.Count()} times").ToList();

  1. If you have intentionally avoided LINQ due to performance issues, please share an example of:

I avoid LINQ when there is a huge amount of data to process. One example was an asset manager that held over 35 million assets with a content-key and a storage-key. I needed multiple dictionaries to allow lookup/mapping of data from one key to another.

In such cases ToDictionary() is too slow as it "re-buckets" way too often, LINQ lacks the option to fine-tune such operations by providing e.g. Dictionary.Capacity. I even replaced foreach with for-loops due to all the overhead. The performance gain was huge, from around 8 sec processing time down to around 1 second.

I'm not sure if that gain still applies today with the dotnet 7 shortcuts.

TehWardy commented 1 year ago

I live by it ... it's an awesome connector between queryable API layers like OData, and database queries but also allows us to leverage framework tooling to generate really complex datasets in realtime with the underlying technology taking the pain.

Honestly I think I abuse it a bit too much but it would be cool to have VS tooling that helps build out complex stuff like this and optimise the logical pathing somehow.

IEnumerable<BucketStats> GetB2BAnalyticData(string tenantId) => b2b
    .GetAll<Bucket>()
    .IgnoreQueryFilters()
    .Where(b => b.Path.StartsWith(tenantId))
    .Select(b => new BucketStats
    {
        Key = b.Key,
        Name = b.Name,
        Path = b.Path,
        ActiveTransactions = b.ActiveTransactions
            .GroupBy(t => t.ActiveTransaction.CurrencyId)
            .Select(g => new ActiveTransactionStats
            {
                Currency = g.Key,
                Unpaid = g.Sum(t => t.ActiveTransaction.UnpaidValue),
                Gross = g.Sum(t => t.ActiveTransaction.Value),
                Net = g.Sum(t => t.ActiveTransaction.ValueBeforeTax),
            }),

        Invoices = b.Invoices
            .GroupBy(t => t.Invoice.CurrencyId)
            .Select(g => new TransactionStats
            {
                Currency = g.Key,
                Gross = g.Sum(t => t.Invoice.Value),
                Net = g.Sum(t => t.Invoice.ValueBeforeTax),
            }),

        Credits = b.Credits
            .GroupBy(t => t.Credit.CurrencyId)
            .Select(g => new TransactionStats
            {
                Currency = g.Key,
                Gross = g.Sum(t => t.Credit.Value),
                Net = g.Sum(t => t.Credit.ValueBeforeTax),
            }),

        Offers = b.Offers
            .GroupBy(t => t.Offer.CurrencyId)
            .Select(g => new OfferStats
            {
                Currency = g.Key,
                Gross = b.Offers.Sum(t => t.Offer.Value),
                TransactionsGross = b.Offers.Sum(t => t.Offer.TransactionValue),
                GrossCost = b.Offers.Sum(t => t.Offer.Cost)
            }),

        Payments = b.RemittanceAdvices
            .GroupBy(t => t.RemittanceAdvice.CurrencyId)
            .Select(g => new PaymentStats
            {
                Currency = g.Key,
                Gross = g.Sum(t => t.RemittanceAdvice.Value),
            })
    }).AsSplitQuery();

... and then there is stuff like this ...

```cs
internal sealed partial class AnalysePlatformUsage : IScheduledOperationRunner

{ static object AnalyseUserActivity(IEnumerable data) => data .GroupBy(i => i.UserId) .Select(g => new { User = new { Id = g.Key, g.First().UserEmail }, Sessions = g .Select(i => i.SessionId) .Distinct() .Count(), PageRequests = g .Where(i => i.EventName.StartsWith("Page_GET/") && !i.EventName.StartsWith("Page_GET/lib/")) .Count(), ApiRequests = g .Where(i => i.EventName.StartsWith("Api_GET/")) .Count() }) .OrderByDescending(i => i.PageRequests + i.ApiRequests) .Take(10);

static object AnalysePageActivity(IEnumerable<UserActivity> data) => data
    .Where(i => i.EventName.StartsWith("Page_GET/") && !i.EventName.StartsWith("Page_GET/lib/"))
    .GroupBy(i => i.EventValue.Split('?').First())
    .Select(g => new
    {
        Page = g.Key,
        Sessions = g
            .Select(i => i.SessionId)
            .Distinct()
            .Count(),
        Hits = g.Count()
    })
    .OrderByDescending(i => i.Hits)
    .Take(10);

static object AnalyseApiActivity(IEnumerable<UserActivity> data) => data
    .Where(i => i.EventName.StartsWith("Api_"))
    .GroupBy(i => i.EventValue.Split('?').First())
    .Select(g => new
    {
        Endpoint = g.Key,
        Sessions = g
            .Select(i => i.SessionId)
            .Distinct()
            .Count(),
        Hits = g.Count()
    })
    .OrderByDescending(i => i.Hits)
    .Take(10);

}

Jay-Madden commented 1 year ago
  1. For IEnumerables always method syntax, its so much easier. For IQueryable It depends on the schema/model design. If the models have mapping properties the method syntax is preferred, if I have to manually join anything then query syntax is preferred

  2. Vast majority of IEnumerable usage is for parsing various things and IQueryables will mostly look like this

        var stuff = await _context.Entity
            .Where(x => x.Id == id)
            .Select(y => new { y.id, y.something })
            .ToListAsync();
  3. Avoidance of linq has mostly been in places related to Nullable conversions where it gets annoying to convert a Nullable<T> to T

Atulin commented 1 year ago
  1. Method syntax, always and forever. Query syntax looks way too alien compared to anything else in C#, it's like randomly throwing a subset of Common Lisp into C++, makes no sense and looks ugly.
  2. Just what you'd usually use to query data. Bunch of .Where()s, a .Select(), an .OrderBy(), capped off with .ToListAsync() or some such. For example,
    var comments = await _context.Comments
    .Where(c => c.CommentsThreadId == thread)
    .OrderByDescending(c => c.DateTime)
    .Select(c => new CommentDto
    {
        Id = c.Id,
        Body = c.DeletedBy == null
            ? c.Body
            : string.Empty,
        Owned = c.AuthorId == uid && c.DeletedBy == null,
        DateTime = c.DateTime,
        DeletedBy = c.DeletedBy,
        EditCount = c.EditCount,
        LastEdit = c.LastEdit,
        IsBlocked = c.Author.Blockers.Any(bu => bu.Id == uid),
        Author = c.DeletedBy != null
            ? null
            : new UserSimpleDto
            {
                Avatar = c.Author.Avatar,
                Title = c.Author.Title,
                UserName = c.Author.UserName,
                Roles = c.Author.Roles.AsQueryable().Select(r => new RoleDto
                            {
                            Id = r.Id,
                            Name = r.Name,
                            Order = r.Order ?? 0,
                            IsStaff = r.IsStaff,
                            Color = r.Color
                            })
            }
    })
    .AsNoTracking()
    .Skip(Math.Max(0, page - 1) * perPage)
    .Take(perPage)
    .ToListAsync();

    with the mappings usually extracted to some Expression<Func<TSource, TTarget>> static get-only property.

  3. Haven't encountered any performance issues caused by LINQ, but then again, I mostly use it in the context of IQueryable.
buzallen commented 1 year ago

Method syntax always.

Please share or link to a representative example of your use of LINQ. Include:

var query = await journals .Include(l => l.Posts) .OrderByDescending(l => l.Date) .ThenBy(l => l.SubGroupId) .AsNoTracking() .ToListAsync();

var exists = loans.Where(l => l.LoanName == "1234").Any()

It gets much more complex but effectively pulling data from the DB and finding things in lists. A lot GroupBy is done and ToDictionary.

.net core 6

Where I avoid it is in loops. In situations where I have 10K contacts and I need to put together a viewmodel of loans that needs the contact name what I would avoid is this (simplified example):

var loans = _context.Loans.Select(l => new { l.LoanId, l.LoanName });
var contacts = _context.Contacts.Select(l => new { l.ContactId, l.FullName });

foreach(var loan in loans) 
{
     var model = new Model 
     {
           LoanName = loan.LoanName,
           ContactName = contacts.Single(l => l.Contact == loan.ContactId).FullName
     };
}

Instead I'll do this:

var contacts = _context.Contacts
     .ToDictionary(l => l.ContactId, k => l.FullName)

and then in the loop I would use the dictionary rather than the single()

mldisibio commented 1 year ago
  1. Method syntax always.
  2. I now use less and less LINQ in production code, but use it quite frequently for one-off projects or tasks, almost like a scripting engine, that
    • scans a file system for particular files
    • parses a verbose http json response or structured document (xml, json, csv), together with dynamic/anonymous types
    • unit tests side effects on collections or tests new language features
// group a list of national parks by region from the deeply nested park REST service response
string json = await CallParkRest();
var myDynamic = json.ToNullSafeDynamicList();
var grpByRegion = myDynamic.Select(dic =>
{
    var entry = (dynamic)dic;
    var park = entry.Unit;
    return new { entry.RegionCode, ParkCode = park.Code, park.FullName, park.TypeDisplay };
})
.Where(entry => entry.TypeDisplay.Equals("Park"))
.GroupBy(entry => entry.RegionCode)
.Select(grp => new { Region = grp.Key, Parks = grp.Select(g => new { g.ParkCode, g.FullName }).ToList()}
);
  1. Having been coding since the early start of EF/NHibernate when it still had performance issues and reducing chattiness was inversely proportional to readability, I never embraced using ORM with LINQ syntax for shaping queries. Although they are better now, I still use ADO.Net and use SQL or call stored procedures that shape the data the way I want directly. I never use LINQ in production database code.
negrifelipe commented 1 year ago
  1. Method Syntax always
    • Tries to find 2 commands that are similar to the commandQuery and that can be executed by the player. The method Match compares the query and the command name and see if they are similar. Item1 is true if they are the same and Item2 is the coincidences that the name has with the commandQuery
    • It is executed every time the player introduces a wrong command. It is used to give suggestions of similar commands
    • Runs on .NET Framework 4.8
      var matches = R.Commands.Commands
              .ToList()
              .Where(x =>
                  !Instance.Configuration.Instance.ExcludedCommandsFromSuggestions.Contains(x.Name) &&
                  (x.AllowedCaller == actor || x.AllowedCaller == AllowedCaller.Both) &&
                  (x.Permissions.Count == 0 && caller.HasPermission(x.Name) || x.Permissions.Any(p => caller.HasPermission(p))))
              .Select(x => Match(commandQuery, x.Name))
              .Where(x => x.Item1)
              .OrderByDescending(x => x.Item2)
              .Take(2);
  2. I never did it
atrauzzi commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

    • Method syntax 100% of the time.
  2. Please share or link to a representative example of your use of LINQ. Include:

    • I don't have anything readily on-hand, but it's a lot of sorting, filtering and projecting
  3. If you have intentionally avoided LINQ due to performance issues, please share an example of:

    • I don't think I have, although I definitely think it can happen, it seems rare. LINQ is so much more expressive than foreach and for that I can usually justify the negligible performance impact. I do wish it was easier to convince people to come down on the other side of the performance vs. legibility debate though. One thing I will say is that there appear to be two forms of "LINQ IS SLOW OMGOGMGMG" that people tend to express. The first is just to do with allocations and performance internal to LINQ. The second is when people make the same mistakes they would if doing foreach and for, but now they have something to blame for bad n* habits.
Enderlook commented 1 year ago

1) I always use Method Syntax, I find awkward the query syntax.

3) I usually code with a performance-by-default mindset, so I hardly ever use LINQ-to-objects. Thought my code is for hobbyist usage so I actually never experienced production scaling issues, only some framerate drops when making video games.

In game development (Unity 2019 and 2020) I never use LINQ-to-objects in common called magic methods such as Update or in OnCollision as hundred of game objects will run them on each frame and allocations hurt the framerate. On once-called methods such as Awake, Start, OnDestroy I use them without problems.

When writing libraries hardly ever use LINQ-to-objects since my libraries are performance oriented. If a library method accepts an IEnumerable<T>, I usually check for common instance types before fallback to .GetEnumerator(). The only exceptions to these rules are when executing a very cold region (such as one-time initialization), for example on static constructors, constructors of singletons, or properties that rarely require changes, and the caching logic can use LINQ.

2) In backend development (Asp Net Core 6) I abuse LINQ-to-SQL (EF Core) and System.Linq.Expression because they are very handy, but I don't use many LINQ-to-objects. On the frontend (WebAssembly Blazor) I allow myself to use LINQ-to-objects.

My main usage of LINQ is LINQ-to-SQL. In those cases, I always try to cache the query to avoid its allocation, especially when I concatenate several operators or use nested operators. Also, I use AsNoTracking by default and don't use lazy queries due to the n + 1 query problem. Those queries are done to the database once per request, though a request may execute several different LINQ queries (usually 1, but can be more than 4 or 5). Examples would be:

private static readonly Func<ApplicationDbContext, List<int>, IAsyncEnumerable<(int Id, int? VariantOf)>> DetermineAbstractProductOfVariantCompositions =
    EF.CompileAsyncQuery((ApplicationDbContext context, List<int> variantCompositionsId) => context.Products
        .Where(e => variantCompositionsId.Contains(e.Id))
        .Select(e => new ValueTuple<int, int?>(
            e.Id,
            context.TraitSlots
                .Where(e2 => context.TraitValues
                    .Where(e3 => e3.OwnerId == e.Id)
                    .Select(e => (int?)e.SlotId)
                    .FirstOrDefault() == e2.Id)
                .Select(e => (int?)e.OwnerId)
                .FirstOrDefault())));
private static readonly Func<ApplicationDbContext, int, Task<ProviderInfoGetBody?>> GetProviderInfoOf =
    EF.CompileAsyncQuery((ApplicationDbContext context, int id) => context.ProductProviderInfo
        .Include(e => e.Product)
        .Include(e => e.Provider)
            .ThenInclude(e => e.Currency)
        .Where(e => e.ProductId == id)
        .Select(e => new ProviderInfoGetBody(
            new ProviderGetBody(
                e.Provider.Id,
                e.Provider.Name,
                new CurrencyGetBody(
                    e.Provider.Currency.Id,
                    e.Provider.Currency.Code,
                    e.Provider.Currency.Name,
                    e.Provider.Currency.Symbol,
                    e.Provider.Currency.Fraction,
                    e.Provider.Currency.Conversion)),
            e.ExternalName,
            e.Cost,
            e.AdditionalCost))
        .FirstOrDefault());

However, in many cases, I found that the provided LINQ operators were limiting for my purposes, so I made a query rewriter that de-sugars (and also caches) queries where I use custom operators. For the rewriter, I didn't use the user-defined function mapping of EFCore because I wasn't familiarized with that technology nor with SQL itself (which is required there), nor knew if it would support my user cases since they rewrite dynamically based on the input and then cache them for future usage. Taking into account that .NET 7 Preview 7 has IQueryExpressionInterceptor, I guess I could refactor my rewriter to use that but didn't try yet, nor know how it handles the EF.CompileAsyncQuery caching exactly. Some examples are:

private static readonly Func<ApplicationDbContext, string, ICurrencyRepository.SortLabel, bool, int, int, IAsyncEnumerable<CurrencyGetBody>> SearchSlice =
    EF2.CompileAsyncQuery((ApplicationDbContext context, string search, ICurrencyRepository.SortLabel sortLabel, bool descending, int offset, int limit)
        => context.Currencies
            // `Search` operator splits the `search` string by whitespaces and for each substring check if any of those properties contain it.
           .Search(search, e => e.Name, e => e.Code, e => e.Symbol)
            // `Switch` operator works similarly to the switch statement of C# but for LINQ.
           .Switch(
                sortLabel,
                new SwitchCase<Currency, ICurrencyRepository.SortLabel, IQueryable<Currency>>[]
                {
                    // If `sortLabel` == `ICurrencyRepository.SortLabel.Id`, do `e => e.OrderBy(e => e.Id, descending)`.
                    // `OrderBy<T, U>(IQueryable<TSource>, Expression<Func<TSource, TKey>>, bool)` allows a boolean value to choose if it's ascending or descending.
                    new(ICurrencyRepository.SortLabel.Id, e => e.OrderBy(e => e.Id, descending)),
                    new(ICurrencyRepository.SortLabel.Code, e => e.OrderBy(e => e.Code, descending)),
                    new(ICurrencyRepository.SortLabel.Name, e => e.OrderBy(e => e.Name, descending)),
                    new(ICurrencyRepository.SortLabel.Symbol, e => e.OrderBy(e => e.Symbol, descending)),
                    new(ICurrencyRepository.SortLabel.Fraction, e => e.OrderBy(e => e.Fraction, descending)),
                    new(ICurrencyRepository.SortLabel.Conversion, e => e.OrderBy(e => e.Conversion, descending)),
                },
                // Default case.
                e => e.OrderBy(e => e.Id, descending)),
           ).Skip(offset)
           .Take(limit)
           .Select(e => new CurrencyGetBody(e.Id, e.Code, e.Name, e.Symbol, e.Fraction, e.Conversion)));

private static readonly Func<ApplicationDbContext, int, string, IAsyncEnumerable<ProductGetHeader>> SearchVariantsOf =
    EF2.CompileAsyncQuery((ApplicationDbContext context, int abstractId, string search) => context.Products
        .Where(e => context.TraitValues
            .Where(e => e.Slot.OwnerId == abstractId)
            .Select(e => e.OwnerId)
            .Contains(e.Id))
        .Search(search, e => e.Name)
        .Select(e => new ProductGetHeader(e.Id, e.Code, e.Name)));

Cases, where I use LINQ-to-object, are in one-time initializers (so the cost is amortized anyway) or where the operation is already expensive. In those cases, I don't mind using a lot of LINQ. For example, from the implementation of the query rewriter (EF2) used above I use for initialization of readonly static variables:

private static readonly MethodInfo[] EF_CompileAsyncQuery_MethodInfos_Raw = typeof(EF)
    .GetMethods(BindingFlags.Public | BindingFlags.Static)
    .Where(e => /* ... */ && e.GetGenericArguments()
            .Skip(1)
            .All(e => e.GetGenericParameterConstraints().Length == 0
                && e.GenericParameterAttributes == default))
    .ToArray();

private static readonly MethodInfo whereCallInfo = typeof(Queryable)
    .GetMethods(BindingFlags.Public | BindingFlags.Static)
    .Where(/* ... */ )
    .Select(e => (e, e.MakeGenericMethod(new Type[] { typeof(object) })))
    .Single(e => e.Item2.GetParameters()[1].ParameterType == typeof(Expression<Func<object, bool>>))
    .e;

public static readonly Dictionary<MethodInfo, MethodInfo> toAsync = typeof(Queryable)
    .GetMethods(BindingFlags.Public | BindingFlags.Static)
    .Where(/* ... */)
    .Select(e =>
    {
        MethodInfo concreteE = e.IsGenericMethod ? e.MakeGenericMethod(Enumerable.Repeat(typeof(object), e.GetGenericArguments().Length).ToArray()) : e;
        Type[] parametersE = concreteE.GetParameters().Select(e => e.ParameterType).Append(typeof(CancellationToken)).ToArray();
        /* ... */

        string name = concreteE.Name + "Async";
        MethodInfo? methodInfo = Array.Find(typeof(EntityFrameworkQueryableExtensions)
            .GetMethods(BindingFlags.Public | BindingFlags.Static),
            e2 =>
            {
                e2 = e2.IsGenericMethod ? e2.MakeGenericMethod(Enumerable.Repeat(typeof(object), e2.GetGenericArguments().Length).ToArray()) : e2;
                ParameterInfo[] parameterInfos = e2.GetParameters();
                return /* ... */ && parameterInfos.Select(e => e.ParameterType).SequenceEqual(parametersE);
            });
        return (e, methodInfo);
    })
    .Where(/* ... */)
    .ToDictionary(/* ... */);

My most common operators are, Where, Select, Single(OrDefault), First(OrDefault), OrderBy, ThenBy, Take, Skip, Concat, Append, ToList, ToArray, All, Any, Sum, ToDictionary and the To[...]Async versions, in that order approximately.

yegor-mialyk commented 1 year ago
  1. Method syntax only
  2. EF Core / per request or per ServiceBus message / .NET 6
  3. N/A, prefer to use caching if possible
CollinAlpert commented 1 year ago
  1. Method syntax, except for joins
  2. It's the weird performance quirks like https://twitter.com/badamczewski01/status/1454762216083836928?s=46&t=C1M4FVQ62oM3M2uAQRvwGw (I don't know if this is still current in .NET 7) which keep me from using LINQ more. But in scenarios which are not explicitly performance critical, I actually prefer LINQ in terms of readability.
echolumaque commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

    • I prefer using method syntax as it is more clean and very fluent. I find it very hard to read when using query syntax. Looks like more SQL than C# when I'm using query syntax.
  2. Please share or link to a representative example of your use of LINQ. Include:

    • I rely on this heavily when I'm manipulating IEnumerables. Sometimes to manipulate a string. Also for checking the collection contents by using .Any() as I found it very readable.

    • I'm a Xamarin developer so I'm using this on a view that uses CollectionView (when I need to project to match the type e.g. Select() or to filter Where())

    • .netstandard2.1

    • No, I use LINQ for free performance boost due to lazy iteration

  3. The problematic LINQ expression (if available) and the code that replaced it

    • N/A
kingmotley commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

Definitely the method syntax. I avoid the query syntax, and will usually convert other's query syntax into method except in very few cases in which the resulting method syntax is difficult to read, which usually means a let and/or left joins.

Please share or link to a representative example of your use of LINQ:

IEnumerable would be: var result = myArr .Where(a => a.something == it) .ToList(); // Find all objects in the collection that have a something of 5

var dups = myArr .GroupBy(z => z) .Where(z => z.Count()) .Any(); // Just find if there are any duplicates inside the IEnumerable

IQueryable over EF Core: var result = context.MyDbSet .Include(z => z.Nav1) .FirstOrDefault(z => z.Id == x); // Get MyDbSet including it's Nav1's on the primary key Id

Often the IQueryable is reshaped into just the data that I actually need so that it can retrieve the results entirely from an index.

If you have intentionally avoided LINQ due to performance issues, please share an example of:

I don't avoid LINQ due to performance issues. If there is a performance problem, I have never found it to be LINQ related other than someone repeatedly calling an IEnumerable instead of materializing it into a collection first. I may change a LINQ query to a for/foreach if it makes the logic clearer or more maintainable, but that is the exception rather than the rule.

neon-sunset commented 1 year ago

1. Do you primarily use LINQ using the Query syntax or the Method syntax?

Method syntax 100%, other teams would sometimes utilize query syntax for special cases or when copying existing niche solutions for APIs like Partitioner but that's it.

2. Please share or link to a representative example of your use of LINQ. Include:

We use LINQ heavily for data processing where we know that LINQ overhead will be reasonably small when compared to other parts of the logic. It is easiest to decide to do so when there is any kind of IO involved be it RPC or ORM access. In fact, it is sometimes a preference of ICs to sacrifice performance if it allows terse expression of operations via LINQ. I wish it didn't impose unnecessary cost though.

An equivalent interpretation of the code used for some of our use cases.

 var lastHour = DateTime.UtcNow - TimeSpan.FromHours(1);
 var notifications = events
    .Where(e => e.Timestamp < lastHour)
    .Select(e => e.ToNotification())
    .ToList();

 var summary = notifications
    .Aggregate(
        new StringBuilder(),
        (sb, notification) => sb.AppendFormat(CultureInfo.InvariantCulture, SummaryTemplate, notification.Id))
    .ToString();

await NotificationsProcessor.Handle(summary, notifications);

// NotificationsProcessor
var workers = notifications
    .Chunk(Parallelism)
    .Select(async chunk =>
    {
        var results = new List<ProcessingResult>(chunk.Length);
        foreach (var notification in chunk)
        {
            results.Add(await HandleInternal(notification));
        }

        return results;
    });

var results = (await Task
    .WhenAll(workers))
    .SelectMany(r => r)
    .ToList();

Multiple times per minute

.NET 6

3. If you have intentionally avoided LINQ due to performance issues, please share an example of:

var errataEntries = jsonProperties // JsonElement.ObjectEnumerator
    .Where(p => p.Name is ErrataKey1 or ErrataKey2)
    .Select(p => (p.Name, p.Value.ValueKind))
    .ToList();
var errataEntries = new List<(string, JsonValueKind)>(8);
foreach (var prop in jsonProperties)
{
    // keys are now utf-8 ReadOnlySpan<byte>
    if (prop.NameEquals(ErrataKey1) || prop.NameEquals(ErrataKey2))
    {
        errataEntries.Add((prop.Name, propValueKind));
    }
}

To fix extra enumerator allocations and slow iteration performance for large payloads with many fields. The biggest issue was that LINQ impl. had all calls behind IEnumerable<T> which means choosing between expressiveness and good codegen/performance. It would be nice one day for LINQ to provide both via generics monomorhpization like it happens with iterator chains in Rust where there is no perf. or alloc. penalty.

Once per incoming request, with requests being hundreds per second.

MaxiTB commented 1 year ago
1. Do you primarily use LINQ using the Query syntax or the Method syntax?

100% method syntax.

Personally I find query syntax very disruptive to read in code and often I see query syntax used out of laziness or by inexperienced devs honestly. I don't think at all that the method syntax is more complex, it's just different and actually more intuitive if you get used to it, because it actually exposes what is really going on underneath and makes it easier to track possible performance issues in code reviews.

2. Please share or link to a representative example of your use of LINQ. Include:

I have used LINQ for in-memory operations and EF alike (generally async). And always with the method syntax.

3. If you have intentionally avoided LINQ due to performance issues, please share an example of:

I use LINQ only for queries. Some developers mix commands/queries with LINQ, which is hard to debug and error prone. In those cases I use always foreach to clearly separate write operations from read operations.

I avoid using query syntax because it has a huge negative impact for reading code + understanding how it is unwind in the background.

dluciano commented 1 year ago
  1. Do you primarily use LINQ using the Query syntax or the Method syntax?

100% Method syntax.

  1. Please share or link to a representative example of your use of LINQ. Include:

Async queries using EF Core, .NET 6.

  1. If you have intentionally avoided LINQ due to performance issues, please share an example of:

Have not experimented performance issues while using LINQ.

gouthamrangarajan commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax Method syntax every single time

Please share or link to a representative example of your use of LINQ. Include:

Where, First, FirstOrDefault , Count, Select

How often it is executed

very frequent

Version of .NET that runs the expression - .Net framework 4.7.2 , .Net Core 3.1

The problematic LINQ expression (if available) and the code that replaced it I ran into performance issues with Parallel.forEach (i know this is TPL and not LINQ but im guessing same might happen for AsParallel as well ) . In one of my use case foreach + await was performing better than Parallel.ForEach (offcourse without await). By performing better I mean it was faster even with adjusting the DegreeOfParallelism for Parallel.ForEach (i did many numbers for the degree) . In my use case the loop was over 20 records and it was fetching records from db.

Execution very frequent as well

simonmckenzie commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

I almost always method syntax, except for cases where multiple SelectMany calls are required, which gets quite messy with all the intermediate types that have to be explicitly defined.

Example of a recent case where I used the query syntax:

var degreesWithSubjectAvailabilities =
    from degree in catalogRepository.GetDegrees(filter)
    from subjectSet in degree.SubjectSets
    where subjectSet.CategoryEnum != CategoryType.Elective
    from subject in subjectSet.RequiredSubjects
    let catalogSubject = catalogRepository.GetSubject(subject.UnitCode)
    where catalogSubject != null && catalogSubject.IsPriorStudyRequired != true
    from availability in catalogSubject.Availability
    where availability.Year == catalogSubject.YearOfStudy
    orderby availability.LastDateToEnrol descending, degree.YearOfStudy descending, catalogSubject.YearOfStudy descending
    select new {degree, subjectSet, catalogSubject, availability};

var latestAvailable = degreesWithSubjectAvailabilities.FirstOrDefault();

What would the above look like as a method chain?

If I were to convert the above to a method chain, with meaningful names, I would get something like this, which is pretty nasty. You might choose to use x for each lambda parameter (personal preference, I'm sure), but even so, having to explicitly project into a new object each time adds cognitive load when reading it.

var degreesWithSubjectAvailabilities = catalogRepository.GetDegrees(filter)
    .SelectMany(degree => degree.SubjectSets, (degree, subjectSet) => new {degree, subjectSet})
    .Where(degreeWithSubjectSet => degreeWithSubjectSet.subjectSet.CategoryEnum != CategoryType.Elective)
    .SelectMany(degreeWithSubjectSet => degreeWithSubjectSet.subjectSet.RequiredSubjects, (degreeWithSubjectSet, subject) => new {degreeWithSubjectSet.subjectSet, degreeWithSubjectSet.degree, subject})
    .Select(degreeWithSubject => new {degreeWithSubject.degree, degreeWithSubject.subjectSet, catalogSubject = catalogRepository.GetSubject(degreeWithSubject.subject.UnitCode)})
    .Where(degreeWithCatalogSubject => degreeWithCatalogSubject.catalogSubject != null && degreeWithCatalogSubject.catalogSubject.IsPriorStudyRequired != true)
    .SelectMany(degreeWithCatalogSubject => degreeWithCatalogSubject.catalogSubject.Availability, (degreeWithCatalogSubject, availability) =>
        new {degreeWithCatalogSubject.degree, degreeWithCatalogSubject.subjectSet, degreeWithCatalogSubject.catalogSubject, availability})
    .Where(degreeWithCatalogSubjectAndAvailability => degreeWithCatalogSubjectAndAvailability.availability.Year == degreeWithCatalogSubjectAndAvailability.catalogSubject.YearOfStudy)
    .OrderByDescending(degreeWithCatalogSubjectAndAvailability => degreeWithCatalogSubjectAndAvailability.availability.LastDateToEnrol)
    .ThenByDescending(degreeWithCatalogSubjectAndAvailability => degreeWithCatalogSubjectAndAvailability.degree.YearOfStudy)
    .ThenByDescending(degreeWithCatalogSubjectAndAvailability => degreeWithCatalogSubjectAndAvailability.catalogSubject.YearOfStudy);

Here's the same query using unnamed lambda parameters - more concise, but significantly harder to say what each x looks like. It's hard to read in a PR, for example!

var degreesWithSubjectAvailabilities = catalogRepository.GetDegrees(filter)
    .SelectMany(d => d.SubjectSets, (d, ss) => new {degree = d, subjectSet = ss})
    .Where(x => x.subjectSet.CategoryEnum != CategoryType.Elective)
    .SelectMany(x => x.subjectSet.RequiredSubjects, (x, s) => new {x.subjectSet, x.degree, subject = s})
    .Select(x => new {x.degree, x.subjectSet, catalogSubject = catalogRepository.GetSubject(x.subject.UnitCode)})
    .Where(x => x.catalogSubject != null && x.catalogSubject.IsPriorStudyRequired != true)
    .SelectMany(x => x.catalogSubject.Availability, (x, a) => new {x.degree, x.subjectSet, x.catalogSubject, availability = a})
    .Where(x => x.availability.Year == x.catalogSubject.YearOfStudy)
    .OrderByDescending(x => x.availability.LastDateToEnrol)
    .ThenByDescending(x => x.degree.YearOfStudy)
    .ThenByDescending(x => x.catalogSubject.YearOfStudy);
stempy commented 1 year ago

Do you primarily use LINQ using the [Query syntax] or the [Method syntax]

For a few years since LINQ came out, I often used a combination of query and method syntax. Now mostly method syntax. Sometimes with multiple complex joins the query syntax can be easier to form initially.

Please share or link to a representative example of your use of LINQ. Include:

Used Linq to SQl, Linq-To-Xml, and linq to data since it came out, along with EF Linq... Lots of different usages of linq in many projects, using LinqPad frequently for scratching out ideas, and potentially optimizing. may share when I get time to collate some.

Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+)

Everything from .NET Framework 3.5 through to .NET 6

If you have intentionally avoided LINQ due to performance issues, please share an example of:

Actually have always pushed for using LINQ more often, always prefer to use it over alternatives where possible, though in some rare cases perhaps not, most performance cases with LINQ and databases (LinqToSql, Linq-To-EF) tended to be around how it was implemented. A very simple case is code I have seen many times is looping over a collection, and generating multiple sql queries, rather than using something like contains (to generate IN) to form a single query. I think a lot of people who have come from SQL queries try LINQ implement like this and put the perf issues down to linq itself when it's their specific implementation and lack of understanding of how to apply with linq that is the issue and not necessarily the actual LINQ performance issues inherent in it.

WhitWaldo commented 1 year ago

Do you primarily use LINQ using the [Query syntax] or the [Method syntax]

While I enjoy the readability of the Query syntax, I predominantly use Method syntax since I can split up the query and iteratively add additional expressions without the entire query residing in one place.

For example, I was recently reading through the Z3.Linq package and while most of the examples are expressed using Query syntax, examples like the Sudoku theorem simply wouldn't be nearly as succinct or easy to follow without using the Method syntax.

Please share or link to a representative example of your use of LINQ. Include:

Z3.LINQ, EF, list manipulation (Where, Select, OrderBy, Distinct), Rx, Reaqtor/Nuqleon,

Version of .NET that runs the expression (for example, .NET Framework 4.5, .NET 6+)

.NET 5 and .NET 6

If you have intentionally avoided LINQ due to performance issues, please share an example of: Nothing to do with performance per se, but I've long wished there would be an iteration on the implementation of LINQ. There's an amazing list maintained at https://github.com/dotnet/csharplang/discussions/4727 and here that documents a lot of ways in which expression trees (and LINQ) could stand to see some evolutionary improvements.

The need for asynchronous and remoted LINQ is also a space where I've enjoyed using Reaqtor instead of straight LINQ (downloading a large set from remote and narrowing locally) or Rx (not async).

Especially with the release of Bonsai with the open-sourcing of Reaqtor, it's opened up entirely new paradigms of using LINQ expressions since I can serialize them for remote operations, so while that has traditionally been a huge negative to LINQ, it's since been resolved.

raffaeler commented 1 year ago

@AaronRobinsonMSFT Please clarify whether you are referring only to queries and not to expression trees in general. I generate a lot of code using expression trees and no query is involved at all.

matt-psaltis commented 1 year ago

@AaronRobinsonMSFT Please clarify whether you are referring only to queries and not to expression trees in general. I generate a lot of code using expression trees and no query is involved at all.

+1 for expanding this discussion to expressions! System.Linq.Expressions is by far where I have run into the most friction over the years. Dynamic / runtime compilation, memory usage combined with no way to unload a compiled expression, process stalls during lambda compiles all have contributed to making the linq ecosystem both brilliant and painful in the same breath. Please consider!

AdamWorley commented 1 year ago

Do you primarily use LINQ using the Query syntax or the Method syntax?

99% of the time Method syntax as the Query syntax usually sticks out and isn't usually as readable . The 1% is for left-joins as many other people have said.

Please share or link to a representative example of your use of LINQ. Include:

Three main ones are FirstOrDefault(), Sum() and Select()

var topFilm = films.Select(x => new {x.Name, Rating = x.Ratings.Sum()})
                              .OrderByDescending(x => x.Rating)
                              .FirstOrDefault();

A short description of the goal of the expression: This isn't a live example but something similar to what I would usually use Linq for as I think it's quite readable as to what's happening e.g. for all the films, get the name and total rating score, order by the ratings in descending order and get me the highest rated.

In general Linq is really helpful for getting a specific value from a collection or for combining values in away that avoid having mutable variables.

Version of .NET that runs the expression - currently targeting .NET 6.

If you have intentionally avoided LINQ due to performance issues, please share an example of:

Often the readability of the code trumps the performance/memory usage, that combined with how as LINQ has been improved and is now able to lower into smarter code (such as optimising the search algorithm used) it's entirely possible LINQ could produce better results than a hand written alternative.

The only exception is for a monte carlo simulation we run with thousands of paths, for this hand written code that avoids allocations is extremely important.

The problematic LINQ expression (if available) and the code that replaced it:

I find the Join method syntax is quite convoluted and I often find myself having to look back at the docs for it and it only get's worse the more joins there are, it's in this instance that the Query style makes a little more sense though also still feels a little backwards when compared to SQL.