Memory is not released when using TransactionScope with async option

neymecc commented 1 month ago

Sample code

while (true)
{
    using (var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
    {
        scope.Complete();
    }
}

snapshot1

Memory usage grows, and response times increase until the web api halts. Removing the async option resolves the issue, but async is required in my case. 😞

Workaround (not perfect, see comment)

I created a factory (singleton) around it to manually trigger GC, something like this simplified:

public class TransactionScopeFactory
{
    private const int GcThreshold = 200_000;
    private static int _createdScopes = 0;

    public TransactionScope CreateScope()
    {
        int currentCreatedScopes = Interlocked.Increment(ref _createdScopes);

        if (currentCreatedScopes >= GcThreshold)
        {
            if (Interlocked.CompareExchange(ref _createdScopes, 0, GcThreshold) >= GcThreshold)
            {
                GC.Collect(2);
            }
        }

        return new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
    }
}

Questions

Is the workaround okay? I'm a bit hesitant about manual GC, as the docs advises against it without a clear purpose and the TransactionScope docs doesn't mention anything about manually invoking GC.
Why does the workaround only work in release mode? It doesn't seem to work in debug mode.

Used versions: .NET 8, 9

Thank you 😄

roji commented 1 month ago

@neymecc if GC.Collect() makes the memory go down again, that means that there's no memory problem with TransactionScope - simply that the GC hasn't kicked in yet to reclaim unneeded memory. In .NET (like other GC languages), memory isn't released immediately, but at arbitrary points, when the GC actually kicks in, based on a variety of considerations. For example, if memory on your machine becomes low, it's very likely that the GC will kick in at that point. You can try to keep doing the above in a loop, and eventually memory usage will go down (a real leak would produce an OutOfMemoryException).

I suggest reading the GC docs for more information; if you think you're application is using too much memory, you may want to tweak the GC settings (take a look at workstation vs. server mode for instance).

neymecc commented 1 month ago

@roji, I might have worded this badly sorry about that, but the app is in fact gets killed by the OS with OOM. I created a sample repo for this that can reproduce this issue every time:

https://github.com/neymecc/runtime-issue-108447

I used these commands to setup a docker container with only 2GB of ram & no swap + start the app.

You might have to install libicu (apt-get update &c apt-get install -y libicu-dev)

docker run -d --memory="2g" --memory-swap="2g" --name="runtime-issue-108447" -p 32999:5000 ubuntu:latest tail -f /dev/null
dotnet publish -r linux-x64 --self-contained -c Release -o .build
docker cp "{project-location}\runtime-issue-108447\src\RuntimeIssue108447\.build" runtime-issue-108447:/opt/.build
docker exec -it runtime-issue-108447 bash
cd /opt
chmod +x .build
cd /opt/.build
./RuntimeIssue108447 --urls "http://0.0.0.0:5000"

(optional) Do a benchmark with http benchmarking app like bombardier

.\bombardier -c 125 -n 10000000 http://localhost:32999/weatherforecast

See the memory charts in this test, the memory is only increasing, latency is degrading, on my system the app gets killed after about a minute. If you increase the RAM, it just delays the kill. GC is trying to do something I can see that but it's like stuck, keeps collecting nothing.

If you remove the async option, it is going to be stable.

I know the test could be better, but without the workaround I have to restart the app like every week.

neymecc commented 1 month ago

@roji, I edited you into the comment above, I’m not sure if it sends a notification after the edit, but just in case, thank you! 😄

roji commented 1 month ago

So just so I understand, is the GC.Collect workaround that you ask about in your original list in fact a workaround, i.e. does it actually prevent the OOM?

neymecc commented 1 month ago

Yes, it does prevent the OOM. In the test above, using that workaround keeps the memory usage ~200 MB (with the used threshold).

roji commented 1 month ago

This does not repro for me. I'm running a simple console program with the following code:

while (true)
{
    using (var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
    {
        scope.Complete();
    }
}

Memory usage is extremely stable at with the expected ups-and-downs as the GC reclaims memory:

In general, I highly recommend isolating your repro, providing as minimal a repro as possible (e.g. a console program rather than an ASP.NET application - the web components shouldn't be relevant here in any way).

So just so I understand, is the GC.Collect workaround that you ask about in your original list in fact a workaround, i.e. does it actually prevent the OOM?

Yes, it does prevent the OOM.

Note that if there's an actual memory leak where memory cannot be reclaimed by the GC, and an OutOfMemoryException is triggered, that means that GC.Collect() shouldn't help the situation in any way. Either the memory can be reclaimed - in which case it should get reclaimed regardless of GC.Collect() - or it can't be reclaimed because it's still being referenced - in which case GC.Collect() wouldn't do anything to help.

neymecc commented 1 month ago

I'm very confused right now, are you testing the code on Mac?

I'm on Windows and also tested the code on Linux.

I created a sample with dotnet new console and just pasted that, mine is just increasing.

Start:

Few minutes later:

KeterSCP commented 1 month ago

@roji I can reproduce the issue locally with your minimal repro code:

My configuration is:

OS: Windows 11 Arch: AMD x64 .NET version: .NET 9 RC2 Run configuration: Release

Comparing two snapshots shows that most of the memory consumption comes from the ConditionalWeakTable:

roji commented 1 month ago

So again, it's OK for the memory to be increasing - the question is whether you get an OOM. You can make memory go faster by spinning up multiple threads that do this loop in parallel.

andrewjsaid commented 1 month ago

If i run the supposedly broken loop in a podman container with 1.5GB memory it reaches and 1.4GB and does not crash. If I lower it to 512MB it reaches ~450MB and remains stable at that level too.

Looks to me like the ConditionalWeakTable is just doing its job.

neymecc commented 1 month ago

If i run the supposedly broken loop in a podman container with 1.5GB memory it reaches and 1.4GB and does not crash. If I lower it to 512MB it reaches ~450MB and remains stable at that level too.

Looks to me like the ConditionalWeakTable is just doing its job.

@andrewjsaid Do you have swap enabled or disabled? If enabled, can you please disable it and run the loop again? Thank you :)

neymecc commented 1 month ago

First of all, thank you guys for taking time to help me, sorry for the uncertainness it's hard for me to explain this issue, even on my primary language. This is my first real "brawl" with the GC but I'm learning so much. 😃

Console test

I have been running the console test for about ~11 hours now, it's fine, although it will consume nearly all the RAM up as @andrewjsaid hinted in his comment. Doing snapshot reveals that it's all ConditionalWeakTable, like 16gb of all usage its ~14gb of that. If it is fine @roji as you said, I trust you. 😄

The issue is only reproducible in a web app. The sample repo that I linked is a "real usage scenario for us" where the task loop is simulating the clients (faster than real to avoid waiting a week or more) using transactions and the weatherforecast endpoint is for to check the overall latency.

Please run the sample repo in a vm/container and disable swap, Two possible outcomes:

OS will kill it in seconds or minutes.
If somehow lives then in the end, more and more time will be spent doing GC, weatherforecast will even go timeout in some asks.

Solution & story time

@roji asked me to try to get an OOM exception. I tried to get one in debug mode on Windows. I thought there is no way that I can fill all the RAM to get OOM exception in reasonable issue reply timeframe so I searched a way to limit memory that the app can use to speed up getting an OOM exception if it is even possible. That's when I found DOTNET_GCHeapHardLimit and DOTNET_GCHeapHardLimitPercent env variables. When set one of them, it remains stable within the set limit, no OOM exception.

The env variables even fixes the sample repo, no more OS OOM kill. Although using the other GC.Collect solution results faster RPS, less RAM usage, there is no more leftover GB++ ConditionalWeakTable data + less overall GC time. It is also better in our project.

Question

I have one more question, the docs says The default value, which only applies in certain cases, is the greater of 20 MB or 75% of the memory limit on the container. The web app(sample repo) is like the certain case? I mean running that without any of the solutions, the 75% will be quickly passed before OS OOM kill.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

roji commented 1 month ago

@neymecc I'm not sure how to help further here - I'm still not 100% clear what you're reporting; does your repro get killed (in seconds or minutes), or not? What do you mean by "killed by the OS", which exact error/exception are you seeing (.NET's OutOfMemoryException or something else)?

In any case, if setting the DOTNET_GCHeapHardLimit "solves" the problem, that means that there is no leak inside System.Transactions - exactly like with GC.Collect(); this means that all memory is reclaimable.

Am removing the area-System.Transactions label and tagging area-GC-coreclr, maybe a GC expert can have some input here.

markples commented 1 month ago

Yes, having the answer to "what exact error/exception are you seeing" would be very helpful here.

My guess is that this is the Linux OOM killer. If the overall machine is at/near the limit, it will choose a process and kill it in order to get some space. It might be this process, but it also could be another.

The default 75% limit works for many deployments but may need to be modified. It allows for some other space to be used by the OS and others. Depending on the size of the machine and what else is running, it can be possible to run with a higher limit, but also it is quite possible that less than 75% is needed. As the .NET process approaches the specified limit (OR if the overall machine is approaching its limit), the GC will get more aggressive. In some cases, it will reclaim lots of memory more quickly. In other cases, there isn't anything to collect, so it just keeps trying and reclaiming small amounts. But the OOM killer can externally step in and kill the process. It is hard to give definitive advice because we don't know all of the specifics of your setup, but it sounds like lowering the 75% limit might be appropriate here.

Separately, in the workaround in the original description on this item, there is a race condition in the count. If _createdScopes is incremented twice by different threads before the CompareExchange happens, then its value can end up larger then GcThreshold. This will cause the return value to be GcThreshold + k but the zero not stored. Every iteration will then cause a gen2 collection, so the work will slow to a near halt.

neymecc commented 1 month ago

Oops, sorry, forgot to include in my last comment that issue can be closed. I can't provide any additional useful information and essentially the question/issue is solved.

@markples Strange when I set the default limit 75% explicitly in our environment its behaving correctly, probably something is messed up in our side. I thought the interlocked increment & exchange along with two step >= check solves that race. I have to revisit that again. Thanks for the heads up. 😄

markples commented 1 month ago

I thought the interlocked increment & exchange along with two step >= check solves that race.

The >= makes sure that you still get true as a result since the return value from the compare-exchange might be larger, but it doesn't impact the operation of the compare-exchange itself, which is "update if the old value equals GcThreshold", not "update if the old value is greater than GcThreshold". Just use a lock around the increment and compare-exchange (and then they don't need to be interlocked) as a simpler solution.

markples commented 1 month ago

Please see #50683 for more information about this. To answer a few things brought up in this issue:

There is a grey area with the word "leak" here. This benchmark (specifically the ConditionalWeakTable internally used by it) creates a finalizable object graph that is problematic for garbage collection - basically all of the objects require gen2 in order to make progress even though they seem like they should be short-lived. When the program approaches the memory limit, the frequent gen2s limit further growth. However, the app shouldn't be using anywhere near that much memory, so while it's not the traditional kind of "leak" it's still a problem.
The GC threshold discussion is a workaround for that underlying problem. Assuming that my guess about the Linux OOM killer is correct, the GC threshold is simply making the GC more aggressive earlier so that the gen2s kick in before the OOM killer gets involved. I'm not sure why the default doesn't work for you, though maybe the GC somehow isn't getting the memory limit in order to use it to find the 75% value (i.e., setting the limit directly rather than as a percentage means you can determine that limit rather than depending on the GC to find it).

neymecc commented 1 month ago

@markples Thank you for the deep dive 😄. Based on your comments, I tried to run the sample repo many times with DOTNET_GCHeapHardLimit set to "0x20000000" [512 MB]. This may explain why I can sometimes trigger an OOM exception with this limit set — like 1 in 10 attempts.

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Threading.ExecutionContext.SetLocalValue(IAsyncLocal local, Object newValue, Boolean needChangeNotifications)
   at System.Transactions.CallContextCurrentData.CreateOrGetCurrentData(ContextKey contextKey)
   at System.Transactions.TransactionScope.PushScope()
   at System.Transactions.TransactionScope..ctor(TransactionScopeOption scopeOption, TransactionScopeAsyncFlowOption asyncFlowOption)
   at Program.<>c.<<Main>$>b__0_1() in C:\Users\******\Desktop\runtime-issue-108447\src\RuntimeIssue108447\Program.cs:line 42
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)

dotnet / runtime