dotnet / wpf

WPF is a .NET Core UI framework for building Windows desktop applications.
MIT License
7.1k stars 1.17k forks source link

WPF control is not reclaimed on NET9 by GC #9467

Open minaew opened 4 months ago

minaew commented 4 months ago

Description

GC does not reclaim control memory in following scenario. The control is placed in the visual tree and then removed from it. Conditions must be met:

Reproduction Steps

Run test TestLeak in project targeting net9.0-windows. NUnit version is 3.12.0.

using System;
using System.Diagnostics;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Threading;
using NUnit.Framework;

namespace Sample {
    [TestFixture]
    public class MemoryTests {
        [Test]
        public void TestLeak() {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            GC.Collect();
            GC.WaitForPendingFinalizers();
            Assert.IsFalse(wr.IsAlive);
        }

        static WeakReference AddAndRemoveImage(Window window) {
            var img = new Image();
            var wr = new WeakReference(img);
            var holder = new ImageHolder(img);
            window.Content = img;
            window.Show();
            DoEvents();
            window.Content = null;
            DoEvents();
            return wr;
        }

        static void DoEvents() {
            var frame = new DispatcherFrame();
            Dispatcher.CurrentDispatcher.InvokeAsync(() => frame.Continue = false, DispatcherPriority.Background);
            Dispatcher.PushFrame(frame);
        }

        class ImageHolder {
            readonly Image image;
            public ImageHolder(Image image) {
                this.image = image;
                this.image.Loaded += OnLoaded;
                this.image.Unloaded += OnUnloaded;
            }

            void OnLoaded(object sender, RoutedEventArgs e) {
                Debug.WriteLine("OnLoaded");
            }

            void OnUnloaded(object sender, RoutedEventArgs e) {
                Debug.WriteLine("OnUnloaded");
            }
        }
    }
}

Expected behavior

The test passes.

Actual behavior

The test fails.

Regression?

This works in net8.0 and net462.

Known Workarounds

No response

Configuration

.NET: 9.0.0-preview.6.24327.6 OS: Windows 10, Version 22H2 (OS Build 19045.4651) Architecture: x64

Other information

No response

teo-tsirpanis commented 4 months ago

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

EgorBo commented 4 months ago

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

minaew commented 4 months ago

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

Thanks. I tried, still no success.

EgorBo commented 4 months ago

Might be the same issue as https://github.com/dotnet/runtime/issues/104218 cc @VSadov

minaew commented 4 months ago

The RetryAttribute helps. Less tries is required if test closes the window.

teo-tsirpanis commented 4 months ago

I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.

minaew commented 4 months ago

I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.

This also does not help, event increases the number of required retries.

minaew commented 4 months ago

By the way, if I hide window (window.Hide()) the test becomes much more stable, but still not 100%.

mangod9 commented 4 months ago

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

@EgorBo, can you please clarify what you mean by behavior change here?

EgorBo commented 4 months ago

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

@EgorBo, can you please clarify what you mean by behavior change here?

            GC.Collect();
            GC.WaitForPendingFinalizers();

do not longer call the finalizer right away in .NET 9.0 while it's consistently calling it in .NET 8.0. Bot for optimized codegen and unoptimized. I know it's not supposed to be 100% guaranteed, but it's just what the author is complaining

mangod9 commented 4 months ago

Ok thanks. Adding @jkotas @VSadov as well, possibly related to moving finalizer loop to managed?

jkotas commented 4 months ago

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

minaew commented 4 months ago

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

net9-gc-issues.zip

minaew commented 4 months ago

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

https://github.com/minaew/net9-gc-issues

VSadov commented 4 months ago

I am able to reproduce this. (Also it passes when targeting 8.0, so this is a 9.0 - specific change)

Even if I refactor the test like

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static WeakReference M1()
        {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            return wr;
        }

        [Test, RequiresThread(ApartmentState.STA)]
        public void TestLeak()
        {
            WeakReference wr = M1();

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();

            Assert.IsFalse(wr.IsAlive);
        }

it reproduces, which indicates that it may be not https://github.com/dotnet/runtime/issues/104218 .

https://github.com/dotnet/runtime/issues/104218 is so far believed to be a race and calling Collect/WFPF more than once typically helps with that.

VSadov commented 4 months ago

The test is not always failing. Sometimes it passes, but rarely.

When it is failing, the image is rooted via a static array

image

VSadov commented 4 months ago

If I change the test in the following way, the test always passes:

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static WeakReference M1()
        {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            return wr;
        }

        [Test, RequiresThread(ApartmentState.STA)]
        public void TestLeak()
        {
            WeakReference wr = M1();

            for (int i = 0; i < 1000000; i++)
            {
                if (!wr.IsAlive)
                {
                    break;
                }

                GC.Collect();
                GC.WaitForPendingFinalizers();
            }

            Assert.IsFalse(wr.IsAlive);
        }

Basically - given enough time the image becomes unreachable and gets collected. But it may take spinning through up to 4000 calls to GC.Collect().

This does not look like an issue with GC or with finalization, but more like something needs to happen on the app/WPF/XAML side before it drops the reference to the image. It does drop it eventually, but it takes time. Perhaps there is a cache cleaned on timer, or something of that sort....

VSadov commented 4 months ago

Another observation - may be helpful for further investigations.

When I run the test (with the "loop until pass modification"), a white window shows up and takes the focus and runs for a while. Then, if I activate any other window, the test immediately passes. That is as if becoming no longer foreground window somehow results in quick cleanup of the image object.

VSadov commented 4 months ago

Another observation:

When I run this with PerfView recording finalization events I see some varying activity initially with all kind of objects finalizing, but then there is a long list of System.Gen2GcCallback finalizing a few instances per each GC (that is the array pools watching for GCs as cache trimming strategy), but no other types finalize up to the eventual passing of the test.

Dropping the reference to Image does not seem to be related to finalizing anything in particular.

VSadov commented 4 months ago

I do not think this is related to finalization thread changes. Possibly not related to the runtime at all.

I think someone more familiar with XAML/WPF side of things should take a look.

kasperk81 commented 4 months ago

should it be moved to dotnet/wpf? @lindexi @h3xds1nz does this net9 regression look familiar?

lindexi commented 4 months ago

@kasperk81 Emmmm, I am not sure...

lindexi commented 4 months ago

As InvokeHandlersImpl method in WPF, it will pass the Image object to _traceArguments in EventRoute when the RouteEvent be invoked, see https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/PresentationCore/System/Windows/EventRoute.cs#L184-L191

That is why the PresentationTraceSources.Refresh is important. The PresentationTraceSources.Refresh will open the TraceRoutedEvent.IsEnabled.

And it do not cause the memory leak, and it will be free after trace. I do not yet know why the behavior difference occurred, I am continuing to investigate...

lindexi commented 4 months ago

The issues cause by https://github.com/dotnet/wpf/pull/6700

To reduce allocations when tracing routed events, we'll use a field to store it which cause this issues.

The EventRoute will be re-use, and it means that the _traceArguments will has the long life time.

Cc @bgrainger

Emmmmm, this issues is not easy to fix. And we can not clean the _traceArguments in Clear method, because the TraceRoutedEvent can not consumption timely and we ca not know when the tracer consume the arguments Not Sure.

Sorry @bgrainger . Upon further investigation, I've found that your code is not-safe. As previously mentioned, we cannot definitively determine when the tracer will finish consuming the _traceArguments. This implies that during the reuse of the RoutedEvent, parameter pollution may occur, potentially leading to the tracer recording incorrect parameters. For instance, if I trigger the first RoutedEvent and record the parameters in _traceArguments and notify the tracer to consume them. Then, I reuse the same RoutedEvent object and update new parameters into _traceArguments. However, if the tracer hasn't finished consuming the first set of parameters, it means that the tracer will be using the second set of parameters when processing the first set, which is not as expected.

Update:

I think the _traceArguments can be clear in Clear method or TraceRoutedEvent.Trace finish. And the TraceSource will use the arrayList not the _traceArguments object. See https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/WindowsBase/MS/Internal/AvTrace.cs#L335-L339

kasperk81 commented 4 months ago

collection expression won't work?

                        TraceRoutedEvent.Trace(
                            TraceEventType.Start,
                            TraceRoutedEvent.InvokeHandlers,
-                          _traceArguments);
+                          [_routeItemList[i].Target, args, BooleanBoxes.Box(args.Handled)]);
h3xds1nz commented 4 months ago

Funny, I was looking at that PR/code last week and was thinking exactly that.

Good job @lindexi on the research.

GerardSmit commented 4 months ago

collection expression won't work?

I think this will still allocate. Possible solutions:

  1. Change object[] to a ReadOnlySpan. Then the collection expression won't allocate.

    In the end, the arguments are used in:

  2. Clear the array after it's being used. This is possible since the parameters are only used in the call stack and not being stored, however if down the line the code changes and the parameter is being stored and then being cleared, then it'll have invalid values. (This is already the case since the array is being re-used from the field).
  3. mangod9 commented 4 months ago

    can this issue be moved to wpf repo ?