dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.62k stars 4.56k forks source link

Use weak table for storing active tasks with debugger attached #26565

Open alandoherty opened 6 years ago

alandoherty commented 6 years ago

I've had to fix numerous problems in my projects using BufferBlock<T> leaking memory, I can't tell if this is by design or if it's actually unintended behaviour. Creating a BufferBlock<T> without completing it seems to leak both memory and scheduled tasks slowly grinding the application to a halt.

Quick example to trigger problem, results in memory leak:

class Program
{
    static void Main(string[] args)
    {
        int gcStep = 0;

        while(true) {
            BufferBlock<Test> test = new BufferBlock<Test>();
            gcStep++;

            if (gcStep % 1000 == 0)
                GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true);
        }
    }
}

class Test { }

Quick example to remedy problem:

class Program
{
    static void Main(string[] args)
    {
        int gcStep = 0;

        while(true) {
            BufferBlock<Test> test = new BufferBlock<Test>();
            test.Complete();
            gcStep++;

            if (gcStep % 1000 == 0)
                GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true);
        }
    }
}

class Test { }

It's not obvious in the documentation that this occurs, and seems like a really easy mistake that can have drastic effects on async performance and memory. My guess is that the Task is keeping the BufferBlock<T> reference alive.

The task which is leaked for every buffer block:

Not Flagged     68006   Scheduled   [Scheduled and waiting to run]  Task.ContinueWith: <.ctor>b__7_2
Clockwork-Muse commented 6 years ago

...what makes you think that BufferBlock is leaking anything, Task or otherwise?

alandoherty commented 6 years ago

I stripped the application down to just that code, I've found this to be a problem in some of my other code but it occurs even when I call BufferBlock constructor with nothing else (in .NET Console App project). I find that this problem leaks memory and tasks as shown below. It is resolved if you call Complete however.

My best theory is that the Completion task is never completed, so stays in memory forever and holds a reference to the BufferBlock<T>.

image

image

image

svick commented 6 years ago

When I run your code using the dotnet command line on .Net Core 2.1 or .Net Framework 4.7.2 and System.Threading.Tasks.Dataflow 4.9.0 on Windows 10, I don't see the behavior you're describing: memory usage according to Task Manager stays stable at few megabytes.

The only thing I can think of: are you using a recent version of the System.Threading.Tasks.Dataflow package (and not the deprecated Microsoft.Tpl.Dataflow package)?

Also, is there anything special about your setup?

alandoherty commented 6 years ago

I'm using .NET Core 2.0 currently, let me try 2.1 and see if that fixes the issue. I'm using the System.Threading.Tasks.Dataflow package, and there isn't anything special about my setup. I've found this issue to occur on Linux as well.

alandoherty commented 6 years ago

Just tested and it doesn't occur with dotnet but it does occur when running from Visual Studio 2017 and .NET Core 2.0/2.1 with both Debug or Release configurations.

stephentoub commented 6 years ago

it does occur when running from Visual Studio 2017 and .NET Core 2.0/2.1 with both Debug or Release configurations

F5 or ctrl-F5? i.e. with the debugger attached or not?

alandoherty commented 6 years ago

Only with the debugger attached it seems.

stephentoub commented 6 years ago

When the debugger is attached (or, more specifically, when the Task ETW-based debugging is enabled by Visual Studio), Tasks are tracked in a static table and kept alive until they're completed, e.g. https://github.com/dotnet/coreclr/blob/7196a7f62ba4047ddd51523cc4a253ba44201033/src/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs#L304-L307 https://github.com/dotnet/coreclr/blob/7196a7f62ba4047ddd51523cc4a253ba44201033/src/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs#L201-L203

alandoherty commented 6 years ago

Is there any way to disable this behaviour? Or mitigate rouge libraries from creating tasks which never complete? It makes it quite difficult to debug memory/performance issues since it adds noise when you specifically are aiming to look for those sorts of issues.

stephentoub commented 6 years ago

Is there any way to disable this behaviour?

I don't know if there's a way to disable the debugger turning on those async causality events. @gregg-miskelly, do you know?

svick commented 6 years ago

@stephentoub Would it make sense to track them using weak pointers (or a similar approach) instead? That way, abandoned tasks could be GCed.

Clockwork-Muse commented 6 years ago

Or mitigate rouge libraries from creating tasks which never complete?

Um, no. You're asking to solve the halting problem. You could probably clean up abandoned tasks that aren't started yet (cannot do work), or abandoned and running that don't have external references or modify no outside-the-task-state (no visible effect) - the behavior of the program suggests that at least the first is being performed.

Clockwork-Muse commented 6 years ago

@svick - to me it sounds like the goal was to prevent them just immediately being GC'd, so you could track down where they came from, because creating a large number of abandoned tasks would normally be considered a programmer error.

@alandoherty - Is the code you're working on really creating a block in a tight loop like that? And normally you are expected to call Complete (and allow it to clean up), so you should at least be doing that.

stephentoub commented 6 years ago

Would it make sense to track them using weak pointers (or a similar approach) instead? That way, abandoned tasks could be GCed.

To answer that I will need to go back and review what purpose the static table serves; I don't remember at this point.

alandoherty commented 6 years ago

I think there's a couple of problems here, firstly it's really not obvious at all in the documentation for BufferBlock<T> that you must Complete the block. I got the idea that it would be safe to discard unused blocks without completing, especially since it's offered in some places as a asynchronous concurrent queue.

Also it was my understanding that it was good practice to always complete tasks eventually, so perhaps the weak reference and auto completion of the task would be an idea? And it would be nice if there was a way to turn off this static table, so that you can prevent uncompleted tasks from causing issues during debugging for other issues.

Clockwork-Muse commented 6 years ago

I think there's a couple of problems here, firstly it's really not obvious at all in the documentation for BufferBlock that you must Complete the block. I got the idea that it would be safe to discard unused blocks without completing, especially since it's offered in some places as a asynchronous concurrent queue.

Yes, it would probably be good to update the documentation to mention that. It's not really "unsafe" to leave them uncompleted, but it is a good idea: If you don't mark it complete, consumers aren't going know they can finish (or producers might keep trying to add things).

Also it was my understanding that it was good practice to always complete tasks eventually, so perhaps the weak reference and auto completion of the task would be an idea? And it would be nice if there was a way to turn off this static table, so that you can prevent uncompleted tasks from causing issues during debugging for other issues.

Sure, BufferBlock will complete the task... if you call Complete(). There's no other way for it to know that it can. Until you call that method, it's possible to add additional processing items (note that the task won't necessarily complete immediately - it hangs around until everything has been processed). It also can't auto-complete the task, because (I think) the real task is part of the completion process, so auto-completing it would interfere with real use.

stephentoub commented 6 years ago

@gregg-miskelly, does VS still use the private Task.GetActiveTaskFromId method (https://github.com/dotnet/coreclr/blob/0fbd855e38bc3ec269479b5f6bf561dcfd67cbb6/src/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs#L6189)? What would the impact be if we were to remove it? Or, what would the impact be if VS passed in the id of a Task that would have otherwise been garbage collected by then?

gregg-miskelly commented 6 years ago

CC @r-ramesh

From reading the code what would happen is that VS will have a row in the tasks window that has whatever original name the task had (from the ETW event) and no location. We might want a new ETW event to notify that a uncompleted task was GC'ed?

To answer your question from Friday - no, we don't have a knob to prevent setting s_asyncDebuggingEnabled.

stephentoub commented 6 years ago

We might want a new ETW event to notify that a uncompleted task was GC'ed?

I don't think we can do that. This would be super expensive, as Task would need a finalizer, and even if we GC.SuppressFinalize'd it if the ETW event wasn't enabled, there's still measurable cost there. In .NET Core 2.1 we added something similar for async methods, but there the runtime controls the instantiation of the async method boxing object, and thus if events are enabled, it can choose to construct one that has a finalizer, whereas the one normally constructed won't. https://github.com/dotnet/coreclr/blob/df78ae72d5cf3f2c2dbe4ff972732d418497f9bc/src/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilder.cs#L505-L514

gregg-miskelly commented 6 years ago

Perhaps we could solve it by better handling a null result from Task.GetActiveTaskFromId to understand that the task has been collected...

stephentoub commented 6 years ago

Perhaps we could solve it by better handling a null result from Task.GetActiveTaskFromId to understand that the task has been collected

If that's feasible, then we could change it to use a weak collection.

Is it?

gregg-miskelly commented 6 years ago

As long as we are okay with breaking older debuggers (.NET Core 3?) I don't see any reason why we can't improve this.

Do you think the right experience is that collected tasks should disappear from the tasks window? Or is this a bug that we should be alerting users to?

There is a comment in our code about getting null back for 'async void' methods. Does that make sense to you? (in other words - do async void methods get a task id)

The debugger interprets GetActiveTaskFromId, so if we switch to some sort of weak collection we will likely need to enhance either VIL and/or ICorDebug to be able to decode the weak collection.

If async void methods get a task id, I might suggest making s_currentActiveTasks a Dictionary<int, WeakReference<Task>> and throw an exception in the collected case so that a debugger can differentiate the two cases. Otherwise we could just return null in that case.

markusschaber commented 3 years ago

As the latest comment on this is from 2018, and I've just spent some hours debugging such "Heisen-Leaks", I wanted to ask what the current state of this issue is.

(Why "Heisen-Leaks": The leaks in question did not happen on my machine, but on the coworkers machine when he tried to test my fixes for some other leaks. Turned out I've been using JetBrains DotMemory, and the "private bytes" view of SysInternals Process explorer, while my coworker just pressed F5 in VS 2019 and used the VS Diagnostic tools, so he still found leaks which I could not reproduce.)

stephentoub commented 1 year ago

Going through old issues. I'd like us to address this for .NET 8. Sorry for the super long delay here... I lost track of this issue.

Do you think the right experience is that collected tasks should disappear from the tasks window? Or is this a bug that we should be alerting users to?

They shouldn't show up as active in the Tasks window. Easiest thing is for them to just disappear, but there is likely a value-add for them showing up ala "oops, something here is a miss that would benefit from investigation".

There is a comment in our code about getting null back for 'async void' methods. Does that make sense to you? (in other words - do async void methods get a task id)

It likely used to make sense, but since .NET Core 2.0 or so, I'd expect async void methods to have an associated task. It's an implementation detail, though.

The debugger interprets GetActiveTaskFromId, so if we switch to some sort of weak collection we will likely need to enhance either VIL and/or ICorDebug to be able to decode the weak collection.

Can you point to what would need to change where?

tannergooding commented 11 months ago

@stephentoub, are you fine with this being future again? We're very unlikely to complete the work for .NET 8 at this point.

stephentoub commented 11 months ago

Yes. We need to tackle this earlier in a release.