dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.18k forks source link

Perf of different tracking behaviors #23558

Open smitpatel opened 3 years ago

smitpatel commented 3 years ago
public class Program { public static void Main(string[] args) { BenchmarkRunner.Run(); } } [MemoryDiagnoser] public class QueryTrackingBehavior { private BloggingContext _context; [Params(1000)] public int NumBlogs { get; set; } [Params(100)] public int NumPostsPerBlog { get; set; } [GlobalSetup] public void Setup() { Console.WriteLine("Setting up database..."); using var context = new BloggingContext(); context.Database.EnsureDeleted(); context.Database.EnsureCreated(); context.SeedData(NumBlogs, NumPostsPerBlog); Console.WriteLine("Setup complete."); } [IterationSetup] public void CreateContext() { _context = new BloggingContext(); _context.ChangeTracker.QueryTrackingBehavior = Microsoft.EntityFrameworkCore.QueryTrackingBehavior.NoTrackingWithIdentityResolution; } [IterationCleanup] public void DisposeContext() { _context.Dispose(); } [Benchmark] public void Tracking() { foreach (var item in _context.Posts.AsTracking()/*.Include(p => p.Blog)*/) { } } [Benchmark] public void NoTracking() { foreach (var item in _context.Posts.AsNoTracking()/*.Include(p => p.Blog)*/) { } } [Benchmark] public void NoTrackingWithIdentityResolution() { foreach (var item in _context.Posts/*.Include(p => p.Blog)*/) { } } public class BloggingContext : DbContext { public DbSet Blogs { get; set; } public DbSet Posts { get; set; } protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) { optionsBuilder.UseSqlServer( @"Server=SLDW;Database=test;Trusted_Connection=True;;Connect Timeout=60;ConnectRetryCount=0"); // @"Server=(localdb)\mssqllocaldb;Database=Blogging;Integrated Security=True"); } public void SeedData(int numBlogs, int numPostsPerBlog) { using var context = new BloggingContext(); context.AddRange( Enumerable.Range(0, numBlogs).Select(_ => new Blog { Posts = Enumerable.Range(0, numPostsPerBlog).Select(_ => new Post()).ToList() })); context.SaveChanges(); } } public class Blog { public int BlogId { get; set; } public string Url { get; set; } public int Rating { get; set; } public List Posts { get; set; } } public class Post { public int PostId { get; set; } public string Title { get; set; } public string Content { get; set; } public int BlogId { get; set; } public Blog Blog { get; set; } } }
Results without Include Method NumBlogs NumPostsPerBlog Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Tracking 1000 100 320.86 ms 6.290 ms 6.178 ms 16000.0000 7000.0000 - 110.76 MB
NoTracking 1000 100 51.55 ms 0.997 ms 1.493 ms 5000.0000 - - 25.95 MB
NoTrackingWithIdentityResolution 1000 100 305.46 ms 3.989 ms 3.114 ms 16000.0000 7000.0000 - 110.73 MB
Results with Include Method NumBlogs NumPostsPerBlog Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Tracking 1000 100 583.1 ms 11.64 ms 21.29 ms 24000.0000 10000.0000 1000.0000 154.54 MB
NoTracking 1000 100 149.0 ms 2.00 ms 1.87 ms 20000.0000 - - 93.86 MB
NoTrackingWithIdentityResolution 1000 100 703.3 ms 11.23 ms 9.95 ms 33000.0000 10000.0000 1000.0000 210.26 MB

The AsTracking operator has no effect in the result how it is configured.

Few ideas

Improvements for NoTrackingWithIdentityResolution

Came out discussion with @roji

roji commented 3 years ago

There are GC everywhere but what is getting garbage collected? At least for tracking kind of scenario, we are not getting any reference out of scope.

I'd (first) concentrate on the "Allocated" column, which really accurately tells us how much is allocated during the method invocation.

Re [IterationSetup], I understand that you want to separate out context creation and disposal, but note that its use is discouraged for benchmarks of less than 100ms. Of course, creating and disposing in the benchmark method also skews results - I don't know enough to know what's more accurate.

In any case, what's particularly important here is the relative values across tracking behaviors, so this doesn't seem that relevant.

roji commented 3 years ago

BTW in the case of anomalous allocations (as may the case above), it's best to fire up a memory profiler and just get the breakdown immediately. Let me know if you want my help doing this (or other profiling).

roji commented 3 years ago

One more note - like in my original version of these benchmarks, I believe it's important to include the disposal of the context in the benchmark, rather than separating it out as is done above - if Dispose itself takes a lot of time (or allocates) in one tracking mode but not another, we want the benchmark to catch that.

The goal here isn't to get exact numbers for each mode, but to compare the different modes against one another - and for the full/holistic query execution (which includes context disposal), not just for a part of it.

smitpatel commented 3 years ago

That is a separate benchmark. You can benchmark 5 different modules in a pipeline separately and also benchmark whole pipeline. But the latter does not replace the former.

roji commented 3 years ago

I mostly disagree. Again, the main point here is to compare the different tracking behaviors, so absolute numbers don't matter that much (what would they be used for anyway?). If one behavior happens to offload some work to Dispose, which another behavior doesn't, then you end up with a totally skewed benchmark showing bad results.

Of course we can always have more and more versions of any benchmark, but the main thing we should be looking at IMO is a holistic comparison of the different behaviors.

smitpatel commented 3 years ago

Query doesn't and shouldn't offload any work to disposal of context. If there is any work happening during dispose in one of the scenario above then it can be replicated without using query too. So that perf bottleneck is independent of above benchmark or query. If you can create a repro which shows that query is offloading some work to Dispose then file an issue we will investigate and fix it.

This issue tracks whatever I have filed above, you can always file a separate issue for whatever you think we should be implementing in this product. This issue perfectly captures the intention/action item/comparison/benchmark from the query pipeline perspective. And I don't think there is any change required here. There can be follow up items after this work is done, but nothing is going to replace the work written here.

I am going to discuss with @ajcvickers before commenting any further.