Tracked Entity remains allocated until new request

czioutas commented 4 years ago

Describe the bug

I did not really want to create this as a bug, because I am not sure if this is expected functionality or actually a bug. Nevertheless my issue in short is that EF keeps the objects created (tracked) even after the request has been completed. I register my Dbcontext with default scoped lifetime cycle and similarly the Repository that has an instance of the Dbcontext. As I understand (could be wrong) after an http request has been completed we should no longer have such object allocations as the repository and dbcontext should be released per request/scoped lifecycle.

What I am currently seeing is illustrated bellow:

In snapshot 1 or RedLine 1 my code is generating some data In snapshot 2 or RedLine 2 my code has finished adding and saving the data to the db and EXITS this is important, this snapshot was taken after the request returned Ok, so I would assume the allocated memory in order to add the objects and save them would be released.

In snapshot 3 or RedLine 3 (I was a bit late, wanted to take this one a bit earlier) I initiated the same exact request to see what it does on round 2 of the same call. and as we can see only after that it started freeing the memory

In snapshot 4 or RedLine 4 memory allocation is increasing normally as the code generates new data yet again.

My problem is that we have a processes that uses almost all of the machine's RAM but manages to finish the task fine. However the moment we try to initiate a new request, because the memory is still allocated it never reaches the "new" point in which the memory is freed as per snapshot 3 and thus dies.

I run this multiple times and tried offloading the process even to hangfire but same results.

The above always happens, increase till request done, then continue using same memory, new request, memory freed.

From what I understand (it's my first time memory debugging) the objects are still held up due to EntityReferenceMap and I do not know if this is expected functionality.

I thought it might be Npgsql related, but I tested with in-memory db and had the same result

To Reproduce

I have created a project here https://github.com/czioutas/efcorememorytest which I also used to generated the images above.

IDE

Microsoft Visual Studio Community 2019 Version 16.6.0

EF Version

    <PackageReference Include="Microsoft.EntityFrameworkCore" Version="3.1.4"/>
    <PackageReference Include="Microsoft.EntityFrameworkCore.Tools" Version="3.1.4"/>
    <PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="3.1.4"/>
    <PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="3.1.4"/>

dotnet --info

.NET Core SDK (reflecting any global.json):
 Version:   3.1.300
 Commit:    b2475c1295

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.18362
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\3.1.300\

Host (useful for support):
  Version: 3.1.4
  Commit:  0c2e69caa6

.NET Core SDKs installed:
  3.0.100 [C:\Program Files\dotnet\sdk]
  3.1.102 [C:\Program Files\dotnet\sdk]
  3.1.300 [C:\Program Files\dotnet\sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 3.1.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

javiercn commented 4 years ago

@czioutas thanks for contacting us.

The DI container will call Dispose on IDisposable services once a scope is released, but that doesn't "free" the memory for those objects. Only the GC is responsible for that. The object will remain allocated and in memory until the GC feels pressure and triggers a collection. It is only at that point when the object will be deemed collectible and deallocated to make the memory available for other objects.

I think what can be happening is that the first request starts, a bunch of caches get populated and then when you send the second request and the GC happens, there are some things that are still cached (for performance reasons) and as a result your system can't allocate all the memory it needs to perform the task on the second request.

I'm no expert on this area, so I'll delegate to @sebastienros @JeremyLikness or @roji as they are more familiar with these investigations/areas.

The main thing to note is that disposing a scope will not free the memory (that's the Garbage collector job), it will only call Dispose on the instances that it created during the scope and remove the references it has to them, but until the GC runs nothing will be freed, and if some service that outlives the scope captured the data, then it will not be possible to reclaim the objects memory.

roji commented 4 years ago

If I understand correctly, @czioutas you are indeed seeing the memory getting released - just later than you expect - can you please confirm that this is the case? If so, than I agree with @javiercn that garbage collection may simply not have happened as soon as you expected, which is expected behavior. Assuming your DbContext is registered with scoped lifecycle, in a regular ASP.NET application it gets disposed at the end of the request, at which point any memory becomes eligible for reclaiming by the GC.

One confusing detail, is that if your process is running out of memory, the GC should generally kick in and free eligible memory - so you should not be getting an out of memory. However, if there are other processes on the same machine, the GC is obviously not aware of that, so memory may not get reclaimed for a while. A bit more specific details may help us understand.

What exact kind of web application are you building? Is this Blazor by any chance (client or server)?

/cc @ajcvickers

czioutas commented 4 years ago

hey all thanks a lot for the response! @roji @javiercn

So the problem originated from a webapi that uses background task queues however the same case appears even in a simple webapi as the only linked in the github repo https://github.com/czioutas/efcorememorytest

If I understand correctly, @czioutas you are indeed seeing the memory getting released - just later than you expect - can you please confirm that this is the case?

@roji yes this is the case. But very very later, and I would expect the memory to be released right after the request ends and not when a new one kicks in. To be honest I had not realized this before even thought I wrote it and I am guessing this might be a more generic problem than EF.

One confusing detail, is that if your process is running out of memory, the GC should generally kick in and free eligible memory - so you should not be getting an out of memory. However, if there are other processes on the same machine, the GC is obviously not aware of that, so memory may not get reclaimed for a while. A bit more specific details may help us understand.

As for this, the machine runs more or less only this app, it reaches full memory (because we have huge 300.000 entities to load), exits but doesnt release the memory which is allocated mainly to EF Entities and Change Tracker. Then when a new one kicks in it starts create new objects etc but the memory is still allocated. I had not seen it gets freed until I tested my example with smaller number of objects and its possible to run many times the same request in short periods of time (the original code runs for 10 minutes).

I would lovely take this off the efcore issue list, but I still cant figure out which part is at fault for this behavior. a) is it because of the EF tracking? b) is it because of the npgsql implementation of tracking? c) is it the GC? d) is it something to do with the lifecycle of objects and GC?

czioutas commented 4 years ago

Some more data:

These are the stats on app start:

These are the stats after data generation:

These are the stats after save and return Ok() of the request

These are the stats after a new request has started

javiercn commented 4 years ago

@czioutas You would have to take a snapshot of the memory when the OOM happens and take a look at the root of the different object graphs to determine what is holding on to the context related objects to determine why they are not being claimed.

czioutas commented 4 years ago

@javiercn will try to do. Will get back to this later because it needs some time to prepare to run locally.

roji commented 4 years ago

@czioutas it sounds like you may be trying to insert 300000 rows with the same DbContext instance. If that's the case, it's definitely recommend you not do this - DbContext was not designed to handle these numbers via a single instance. Try splitting your import into many small, fixed-length batches, i.e. instantiating a new DbContext instance for each 1000 entities.

An even better option for importing at this scale is to use Npgsql's binary COPY feature, which allows extremely efficient import/export. You would do this outside of EF Core.

JeremyLikness commented 4 years ago

One other thing for you to try @czioutas... I think you'll see more relief and better performance if you implement IDisposable to explicitly dispose of resources earlier. I also noticed you are using synchronous access to the dbcontext in the controller. I haven't had time to configure the database and run tests but I'm curious if you try these updates how it may impact your results.

czioutas commented 4 years ago

So the process has been running more or less since my last comment ~2 hours. I did 3 requests and although the app did not crash (thats good) the memory usage did not go lower than 14gigs. So it is freeing memory if it find a tough situation but still doesnt remove the old entity elements.

I uploaded on gdrive 2 gcdumps. 1 at application start prior to any requests and 2 from 10 minutes ago after all the work has been done but still high allocation. https://drive.google.com/drive/folders/1UYsg7ef3l6rxrBNNOiiqflMqyfzhluV_

@roji yes we are doing bulk inserts. It was doing pretty well with 2-3 gig allocations but then we had this file which consists of 300.000 entities and kinda ruined our plans 😔

I will try instantiate the dbcontext per batches and see if the memory is allocated better.

--

@JeremyLikness will apply the changes and report back 🙏 RE IDisposable, I had assumed that due to their lifecycle we wouldnt need to add it but maybe I was too optimistic.

P.S Thanks a lot everyone for the support!

roji commented 4 years ago

@roji yes we are doing bulk inserts. It was doing pretty well with 2-3 gig allocations but then we had this file which consists of 300.000 entities and kinda ruined our plans 😔

Yeah, that definitely will not scale. Since the DbContext tracks all added entity instances until it is disposed, all instances of a given import exist in memory at the same time. Either batch or switch to Npgsql COPY, which should dramatically speed up your imports in addition to eliminating memory pressure.

czioutas commented 4 years ago

Made all the changes as suggested by @roji and @JeremyLikness

new scope,dbcontext,repository per batch
async/disposable changes
batching operation, which deals with batches of 1k elements out of 100k

https://github.com/czioutas/efcorememorytest/blob/master/src/Controllers/DefaultController.cs#L70

using (var scope = _serviceProvider.CreateScope())
using (var _context =(ProjectDbContext)scope.ServiceProvider.GetRequiredService(typeof(ProjectDbContext)))
using (IRepository<TestModel> _repository = new DbContextRepository<TestModel>(_context))
{
        foreach (TestModel _tm in elements)
        {
             _repository.Add(_tm);
        }

        await _repository.SaveAsync(); // final save for items outside of the batch
}

Unfortunately there is no difference/change I saw some more aggressive GC action but nothing freed

and once again memory starts getting freed when the request is started the second time (red line is when I initiated the 2nd call)

roji commented 4 years ago

I haven't looked in depth, but note that you're still generating you entire data set (a list of 100000 instances of TestModel), and only then batching the DbContext operation. That means that your 100k instances stay in memory for the duration of the entire request - the batching only affects the EF Core side. Since TestModel seems to have lots of data and lots of nested entities, that is probably the source of the problem.

Try rewriting the logic to actually produce batches of 1000 entities at a time, so you never have more in memory. In a real life application, you'd be reading data in batches and saving it to the database, as opposed to reading the entire data into memory, and then saving it to the db batch by batch.

czioutas commented 4 years ago

This is very interesting. I switched to 1k batches that generate the data inside the loop and something new came up

Bellow you can see that now out of the total 100k the last 30k remain so the GC partially worked or at least better, but these are huge allocations. 30k objects out of which only the 1k should be there.

furthermore now Npgsql is doing magic by keeping lots of Npgsql Params laying around...

Nevertheless at this point speed/efficiency is not the issue. The problem is the memory hog and I cannot understand where it is coming from. However I do not believe it is EF Core related anymore. Open for input.

roji commented 4 years ago

@czioutas as discussed above, nothing is really being "kept around" - garbage collectors simply don't reclaim memory right away, but rather every now and then, or when there's memory pressure. There's no Npgsql magic around keeping NpgsqlParameter instances laying around - these are simply temporary gen0 instances used for the inserts, and which are discarded afterwards - we're talking about 1.3MB for the entire 30k, which is quite normal/expected (the fact that these allocations are needed at all is somewhat unfortunate, but that's another story). As you noticed, by switching to batching, you're allowing the GC to reclaim objects before the request ends, if it chooses to do that.

To sum it up, the above memory snapshot looks pretty decent to me - I can't really see any anomalies or issues. Let me know if you think otherwise.

czioutas commented 4 years ago

@roji sorry for the late reply, decided to actually take the monday holiday :P

I see your point and I guess it was a wrong expectation on how (and how early) the GC collects the memory. We can alter our application to fit this model. We wont have time to do this now so I guess we can close the issue until we implement on our side this smaller batching and report back in the future.

roji commented 4 years ago

@czioutas reopening to make 100% sure there's no issue here:

Is your process actually running out of memory (i.e. crashing)?
If the process is crashing, is this happening only because your request attempted to hold the entire dataset in memory (without batching), and the dataset was simply too big to hold in memory? If it's the latter, then things are working as expected.
If the process isn't crashing, then is the problem simply that memory is only being freed later than expected? If so, then as discussed above things are probably working as expected.
However, if you are seeing an out-of-memory crash, and you're not explicitly referencing a huge dataset in memory within a single request (i.e. no batching), then I definitely need to investigate this further.

Can you please confirm?

pavlalexandr commented 4 years ago

I had something similar with my console .NET Core app. The solution was to do not use Microsoft built-in DI container for DBContext lifetime. Somehow and somewhere it stores every pointer for Scoped and Transient objects, so they can't be released. So

using(var dbContext = new MyDbContext())
{
}

is the best thing which can be done for now.

czioutas commented 4 years ago

@roji sorry for the late reply. So maybe 1 thing I forgot to mention is that the app runs within docker. We made the changes RE scope and today it did run into memory issues and just stopped working. The .net core inside the container just stopped and had to restart the container.

The application is an ETL so its only purpose is to do that. read files, transform and load them onto the db. We are dealing with massive amounts of entities (300k) and a larger one is coming so we are trying to get ready but I also understand that by your side this is not normal operation for ef so I dont expect you guys to magically solve it :)

I will be moving the application outside of the container in the coming days so maybe we can remove at least that also from the "variables" that can affect it.

@pavlalexandr ye kind of what problem I have atm and switching to creating new objects out of a new ServiceScope helps. I just dont get if this is a bug or expected behavior.

roji commented 4 years ago

@czioutas if there's a problem here, I'm definitely interested in reproducing and understanding this even if it isn't purely an EF Core issue.

Are you enforcing a memory limit on your docker container? If so, how much is it? Did you see the crash after switching to batching, or does that occur only when loading the entire 300k entities into memory as part of the request? In any case, I'll examine your repro above in more detail and we'll see what I come up with.

czioutas commented 4 years ago

@roji we do not set any limits to the docker container. We do not have batching enabled only the scoped dbcontext (batching is difficult to implement right now) - so the crash happened in an app that generates all the models (300k) then saves them in one dbcontext (although we could change and create a new dbcontext by passing the service scope). but the crash doesnt happen from that request, it happens from the subsequent that require memory and as the previous one takes almost 90% of the ram, the new one crashes the app.

But it is not easy to reproduce and it takes a lot of time :/

I think I would have to make some changes to the example repo to make it as close to the current app possible and then try to crash it :P

roji commented 4 years ago

Thanks @czioutas, I'll take a look at this in the coming days and let you know. Regardless of any issue here, you're definitely better off doing batching rather than loading the entire dataset - this way your maximum memory requirements would be fixed.

robi26 commented 3 years ago

I had exactly the same problem. Finally solved it (partially) by detaching the entities before disposal of the dbcontext

var changes = dbContext.ChangeTracker.Entries<TEntity>();
foreach (var item in changes)
{
 item.State = EntityState.Detached;
}

ShreyasJejurkar commented 2 years ago

If I understood correctly above discussion, does it mean that if 1st request hits the server, it allocates a bunch of entity objects in memory to do the EF thing, once the request is finished, the DbContext gets disposed but not the entity objects? If that is so if my 2nd request comes after 10 min, for that whole time the objects are still are in memory?

And the memory gets reclaimed only after if GC feels memory pressure!?

Consider a scenario, GC memory pressure limit is 500 MB, now if 1st request allocates and consumes 300MB memory for entity objects and the request ends, the objects are still in memory btw. And now the second request comes and somehow GC finds that the current request might allocate another 300 MB for entity objects, but now 300 MB (from the previous request) + 300 MB (will be from current request), as it exceeds GC memory pressure limit of 500 MB, now GC gets triggered in between request to clean up those previous request objects instead of serving the incoming HTTP request. Is my understanding correct? Is this what the behavior is as of now?

roji commented 2 years ago

@ShreyasJejurkar yes - that should be the current behavior. This has nothing to do with EF Core - that's just how garbage collection works; once the DbContext is disposed, that context and all the entity instances it referenced are now eligible for garbage collection (assuming no user code is referencing them). When that memory actually gets reclaimed is up to the .NET garbage collector.

So yes, if each web request allocates 300MB and the total memory limit is 500MB, then a GC pass will be triggered at the beginning of each request. This is the expected behavior.

This issue is kept open, because there seemed to have been an assertion above at some point that the process is running out of memory (crashing). That shouldn't happen, but as long as the process still runs, when exactly memory gets reclaimed isn't a detail that EF Core can control.

roji commented 2 years ago

I'm going to go ahead and close this, as there doesn't seem to be an issue here. If we get reports (or a code sample) of memory not getting released - without user code referencing it - I'll definitely reinvestigate.

dotnet / efcore