dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.66k stars 3.16k forks source link

Iterating over DbSet<T> with .AsNoTracking() with SQL Server provider isn't streaming the data, it's reading it all into memory #7363

Closed jemiller0 closed 1 year ago

jemiller0 commented 7 years ago

If I iterate over a DbSet and use either .AsNoTracking() or set the QueryTrackingBehavior to NoTracking to turn change tracking off globally, I expect the results to be streamed from the database server and not be all read into memory at once. The following is a simple example that would demonstrate the problem. You need to query a table with a large amount of data in it to easily see the results. In my example, my table has 7.7 million rows in it and I can see the memory for the console application that it resides in grow to over 6 GB. This resulted in an OutOfMemoryException before I changed the app from 32 bit to 64 bit. The problem only appears to occur when using the SQL Server provider. I have the same large table in the same database in PostgreSQL and MySQL and the problem doesn't happen using the Npgsql or Pomelo drivers. The memory usage for the app is more like 40 MB when using the other providers.

using (var lc2 = new LibraryContext())
{
    var i = 0;
    var s2 = Stopwatch.StartNew();
    lc2.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
    foreach (var b in lc2.Bibs.AsNoTracking())
    {
        if (++i % 10000 == 0)
        {
            traceSource.TraceEvent(TraceEventType.Verbose, 0, $"{i} {s2.Elapsed}");
            s2.Restart();
        }
    }
}

The following are log messages from EF Core. It says that it's not tracking the entities, but, it behaves kind of like it is. I don't know if the problem has to do with tracking or not. I just know that it appears to be loading everything into memory rather than streaming the results as I would have expected it to. Note, I'm not calling .ToList() or .ToArray(), I'm attempting to iterate using a foreach statement.

LibraryLibrary Verbose: 0 : Compiling query model:
'(from Bib <generated>_1 in DbSet<Bib>
select <generated>_1)
.AsNoTracking()'
    ThreadId=1
    DateTime=2017-01-05T22:05:14.4120698Z
LibraryLibrary Verbose: 0 : Optimized query model:
'from Bib <generated>_1 in DbSet<Bib>
select <generated>_1'
    ThreadId=1
    DateTime=2017-01-05T22:05:14.4180723Z
LibraryLibrary Verbose: 0 : TRACKED: False
(QueryContext queryContext) => IEnumerable<Bib> _ShapedQuery(
    queryContext: queryContext,
    shaperCommandContext: SelectExpression:
        SELECT [b].[Id], [b].[Author], [b].[BibStatusId], [b].[BibStatusLastWriteTime], [b].[BibStatusLastWriteUserName], [b].[Content], [b].[CreationTime], [b].[CreationUserName], [b].[FormatId], [b].[IsPublic], [b].[LanguageId], [b].[LastWriteTime], [b].[LastWriteUserName], [b].[Level], [b].[OclcNumber], [b].[PublicationDate], [b].[Title], [b].[Type]
        FROM [Bibs] AS [b]
    ,
    shaper: UnbufferedEntityShaper<Bib>
)

    ThreadId=1
    DateTime=2017-01-05T22:05:14.4991379Z
LibraryLibrary Verbose: 0 : Opening connection to database 'Library' on server 'reports'.
    ThreadId=1
    DateTime=2017-01-05T22:05:14.5221488Z
LibraryLibrary Verbose: 0 : Executed DbCommand (17ms) [Parameters=[], CommandType='Text', CommandTimeout='90']
SELECT [b].[Id], [b].[Author], [b].[BibStatusId], [b].[BibStatusLastWriteTime], [b].[BibStatusLastWriteUserName], [b].[Content], [b].[CreationTime], [b].[CreationUserName], [b].[FormatId], [b].[IsPublic], [b].[LanguageId], [b].[LastWriteTime], [b].[LastWriteUserName], [b].[Level], [b].[OclcNumber], [b].[PublicationDate], [b].[Title], [b].[Type]
FROM [Bibs] AS [b]
    ThreadId=1
    DateTime=2017-01-05T22:05:14.7170137Z

If you are seeing an exception, include the full exceptions details (message and stack trace).

OutOfMemoryException if the table is large enough and the app is 32 bit.

Further technical details

EF Core version: 1.1.0 Database Provider: Microsoft.EntityFrameworkCore.SqlServer 1.1.0 Operating system: Windows 10 Anniversary Update IDE: Visual Studio 2015 Update 3

jemiller0 commented 7 years ago

I just tried it with tracking enabled. It uses even more memory. You can see memory climb when it gets to the point where it's iterating over the records. With no tracking enabled, memory climbs before it even starts to iterate. It appears to load everything into memory first and then starts to iterate. I tried it with a different table. The first table I tested with had an XML field. So, I tested with a second table that doesn't have that in case that might have something to do with it.

anpete commented 7 years ago

@jemiller0 I just tried to repro this by streaming ~13M rows and didn't see any issue. Any chance you can use a memory profiler to determine which objects are leaking?

jemiller0 commented 7 years ago

I just tried the query again and saw the same behavior with another database and another large table. The table has an XML field in it. Not sure if that's relevant or not. MySQL and PostgreSQL both work fine. They use about 80 MB RAM and it stops there while streaming through the entire 7.7 million row table. Using the SQL Server driver, memory continues to climb. I'm not familiar with how to use the profiler, but, I gave it a shot. I have Visual Studio 2017 Enterprise. It looks like there are lots of strings and a Queue with lots of ValueBuffers in it. Not sure why I'm not seeing any other types of objects. I could send you a dump file if you think it would be helpful and could tell me how to create it.

session snapshot1

jemiller0 commented 7 years ago

I'm using version 1.1.0. Is that the version you tried?

smitpatel commented 7 years ago

@jemiller0 - Can you provide repro code for this? We are not able to reproducing it based on code provided above. Looking at the exception and memory profiling, the query buffer is buffering all the results. That happens when MARS is disabled & there are multiple result sets are opened (causing result set to be fully buffered.) Code listing we need is, Model classes, DbContext, Code being run when running the app. Table design so that we can mimic the structure of it (and fill large data). Your connection string or MARS settings.

jemiller0 commented 7 years ago

I have MARS set to false. Should I have it set to true? I didn't think EF Core used MARS.

smitpatel commented 7 years ago

EF Core uses MARS when available. When MARS is disabled and multiple record sets are needed that's where EF buffers query results. (That is happening in this case). If you set MARS to true, above problem should not be arising. Though we still have to investigate that why the buffering was initiated by EF. With just single record set like above query, MARS should not have any effect. There could be another active result set, which is causing this query to buffer.

jemiller0 commented 7 years ago

Thanks. I will try it with MARS enabled and see what happens.

jemiller0 commented 7 years ago

I tried this with MultipleActiveResultSets=true. I also tried it with a few different large tables that I have in two different DbContexts. Enabling MARS made no difference. The behavior was the same for all the tests that I did. It noticeably increases the RAM usage until it gets through the entire results set and releases it. Any idea when the next version of EF Core will be released? Maybe this by chance has been already fixed in current builds. Definitely looking forward to a new release.

ajcvickers commented 7 years ago

@jemiller0 @maumar @anpete @smitpatel Could this be related to #8147

smitpatel commented 7 years ago

Possibly related. That seem to be one explanation for buffering. Though query is not matching that shape.

@jemiller0 - All issues reported by you are happening due to query buffering results instead of streaming since buffering takes time and memory. Therefore you are not able to see immediate exception. And as in case of SqlServer you run out of memory because all the results are loaded in the memory. It is certainly not just large amount of data causing query to buffer. Buffering requirement for EF is independent of type/size of results. You need to provide complete & stand-alone code listing which is generating issue for you. We are unable to see any issues even with large amount of data in our tables. Since buffering is happening, either it has be to certain shape of query causing it or there are multiple queries running (though likely something in query since MARS did not have any effect). In either case, we need a stand-alone repro code.

anpete commented 7 years ago

@ajcvickers QueryBuffer uses weak refs to avoid this issue when streaming. Also the query plan above is showing that it is not using the QueryBuffer (UnbufferedEntityShaper). Additionally, the memory snapshot screen dump above is showing that the objects are rooted by a Queue, which we only use when doing non-MARs buffering,.

jemiller0 commented 7 years ago

I appear to be having even bigger problems now. Now, if I even run a query where I do a projection, it is consuming large amounts of memory. The following code worked fine in the past. I just tested a console app with nothing else in it but this and it consumed about 900 MB. Tried it with EF 6 and it's only using 300 MB. The app is compiled for 64-bit. It uses less when configured for 32-bit.

var l = lc.Bibs.Select(b => b.Id).ToArray();

I thought maybe it was a DLL versioning issue or something since I have a number of other packages included in my project such as the following. As well as the EF 6 provider for SQL Server, MySQL, and PostgreSQL.

Dapper Microsoft.EntityFrameworkCore.SqlServer Npgsql.EntityFrameworkCore.PostgreSQL Pomelo.EntityFrameworkCore.MySql NHibernate log4net System.Linq.Dynamic

I created a new project and only added Microsoft.EntityFrameworkCore.SqlServer from NuGet and still have the problem.

Until I get this resolved, I cannot use EF Core. I don't understand what happened. I have queries like the above all over the place and they were working fine before. Not sure if when I upgrade to VS 2017 and EF Core 1.1.1 was when the problem started.

I just ran the VS 2017 memory profiler and came up with the following. I don't really know how the memory profiler works. So, I don't know what to make of it. I do see that UntypedRelationalValueBufferFactory in in there a lot.

image

jemiller0 commented 7 years ago

Not sure if this is any help, but, it looks like a BufferAll() method is being called.

image

ajcvickers commented 7 years ago

@jemiller0 As @smitpatel said above, for us to make any progress on this we will need a complete & stand-alone code listing to reproduce the issue. We have spent a lot of time trying to get the results you are seeing and we simply cannot reproduce it. We may be missing something simple, but without a complete repro we are currently out of ideas.

jemiller0 commented 7 years ago

OK, I attached a project that demonstrates the problem. To see the problem, run the console application. The app runs until it prompts for a key press. Run it, then check Task Manager to see how much memory is in use. I see 1.2 GB with EF Core and 182 MB with EF 6. Comment or uncomment the TestEF6() and TestEFCore() method calls in the Main() method in the console app to test each. The project is a VS 2017 project.

Test.zip

If you can email me at jemiller@uchicago.edu, I can give you a link to download a SQL Server 2016 backup of the database that I'm using. It's about 22 GB. I'm not sure how easy it is to see the problem without having a large database.

anpete commented 7 years ago

Thanks @jemiller0!

anpete commented 7 years ago

OK, we think it is because you are enabling retry on failure in OnConfiguring. Can you try with this disabled? Retrying causes us to buffer the results ahead of time because we don't want to provide incomplete results.

jemiller0 commented 7 years ago

Thanks @anpete MUCH APPRECIATED! Yes, that solved my latest problem as well as the original one. Sorry for wasting so much of everyone's time. Next time, I will submit a complete example to begin with. I would swear the id query into the List was working until recently though. Not sure if maybe something changed in 1.1.1. Or, maybe I'm just confused. At any rate, I know to steer clear of RetryOnFailure(). Thanks again.

jemiller0 commented 7 years ago

Actually, I may have spoken too soon. I think that does fix the original problem that I had. However, it looks like I'm having the same problem with the Pomelo MySQL provider. Only, when I use that one, the problem looks much worse. It went all the way up to 28 GB RAM in use and still hasn't completed after a few minutes of running. Something is definitely wrong there. But, that test is using a similar table, but, a different DbContext. The weird thing is, this has been working fine. I'm thinking maybe it's a bug in Pomelo's latest provider, or the MySQL provider that it uses.

jemiller0 commented 7 years ago

It looks like that is a separate issue with the Pomelo 1.1.1 provider. I downgraded to version 1.1.0 and it works fine. I submitted a separate bug report with Pomelo.