Open minaew opened 4 months ago
Can you add [MethodImpl(MethodImplOptions.NoInlining)]
in AddAndRemoveImage
and try again? There's a chance the JIT has inlined AddAndRemoveImage
, which might have extended the lifetime of img
.
Can you add
[MethodImpl(MethodImplOptions.NoInlining)]
inAddAndRemoveImage
and try again? There's a chance the JIT has inlinedAddAndRemoveImage
, which might have extended the lifetime ofimg
.
if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows
and net9.0-windows
for both Tier0 and Tier1, so shouldn't be JIT's fault.
Can you add
[MethodImpl(MethodImplOptions.NoInlining)]
inAddAndRemoveImage
and try again? There's a chance the JIT has inlinedAddAndRemoveImage
, which might have extended the lifetime ofimg
.
Thanks. I tried, still no success.
Might be the same issue as https://github.com/dotnet/runtime/issues/104218 cc @VSadov
The RetryAttribute helps. Less tries is required if test closes the window
.
I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.
I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.
This also does not help, event increases the number of required retries.
By the way, if I hide window (window.Hide()
) the test becomes much more stable, but still not 100%.
Can you add
[MethodImpl(MethodImplOptions.NoInlining)]
inAddAndRemoveImage
and try again? There's a chance the JIT has inlinedAddAndRemoveImage
, which might have extended the lifetime ofimg
.if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between
net8.0-windows
andnet9.0-windows
for both Tier0 and Tier1, so shouldn't be JIT's fault.
@EgorBo, can you please clarify what you mean by behavior change here?
Can you add
[MethodImpl(MethodImplOptions.NoInlining)]
inAddAndRemoveImage
and try again? There's a chance the JIT has inlinedAddAndRemoveImage
, which might have extended the lifetime ofimg
.if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between
net8.0-windows
andnet9.0-windows
for both Tier0 and Tier1, so shouldn't be JIT's fault.@EgorBo, can you please clarify what you mean by behavior change here?
GC.Collect();
GC.WaitForPendingFinalizers();
do not longer call the finalizer right away in .NET 9.0 while it's consistently calling it in .NET 8.0. Bot for optimized codegen and unoptimized. I know it's not supposed to be 100% guaranteed, but it's just what the author is complaining
Ok thanks. Adding @jkotas @VSadov as well, possibly related to moving finalizer loop to managed?
possibly related to moving finalizer loop to managed?
It is very unlikely to be related to that change.
I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?
possibly related to moving finalizer loop to managed?
It is very unlikely to be related to that change.
I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?
possibly related to moving finalizer loop to managed?
It is very unlikely to be related to that change.
I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?
I am able to reproduce this. (Also it passes when targeting 8.0, so this is a 9.0 - specific change)
Even if I refactor the test like
[MethodImpl(MethodImplOptions.NoInlining)]
private static WeakReference M1()
{
PresentationTraceSources.Refresh();
var window = new Window();
var wr = AddAndRemoveImage(window);
return wr;
}
[Test, RequiresThread(ApartmentState.STA)]
public void TestLeak()
{
WeakReference wr = M1();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
Assert.IsFalse(wr.IsAlive);
}
it reproduces, which indicates that it may be not https://github.com/dotnet/runtime/issues/104218 .
https://github.com/dotnet/runtime/issues/104218 is so far believed to be a race and calling Collect/WFPF more than once typically helps with that.
The test is not always failing. Sometimes it passes, but rarely.
When it is failing, the image is rooted via a static array
If I change the test in the following way, the test always passes:
[MethodImpl(MethodImplOptions.NoInlining)]
private static WeakReference M1()
{
PresentationTraceSources.Refresh();
var window = new Window();
var wr = AddAndRemoveImage(window);
return wr;
}
[Test, RequiresThread(ApartmentState.STA)]
public void TestLeak()
{
WeakReference wr = M1();
for (int i = 0; i < 1000000; i++)
{
if (!wr.IsAlive)
{
break;
}
GC.Collect();
GC.WaitForPendingFinalizers();
}
Assert.IsFalse(wr.IsAlive);
}
Basically - given enough time the image becomes unreachable and gets collected. But it may take spinning through up to 4000 calls to GC.Collect().
This does not look like an issue with GC or with finalization, but more like something needs to happen on the app/WPF/XAML side before it drops the reference to the image. It does drop it eventually, but it takes time. Perhaps there is a cache cleaned on timer, or something of that sort....
Another observation - may be helpful for further investigations.
When I run the test (with the "loop until pass modification"), a white window shows up and takes the focus and runs for a while. Then, if I activate any other window, the test immediately passes. That is as if becoming no longer foreground window somehow results in quick cleanup of the image object.
Another observation:
When I run this with PerfView recording finalization events I see some varying activity initially with all kind of objects finalizing, but then there is a long list of System.Gen2GcCallback
finalizing a few instances per each GC (that is the array pools watching for GCs as cache trimming strategy), but no other types finalize up to the eventual passing of the test.
Dropping the reference to Image does not seem to be related to finalizing anything in particular.
I do not think this is related to finalization thread changes. Possibly not related to the runtime at all.
I think someone more familiar with XAML/WPF side of things should take a look.
should it be moved to dotnet/wpf? @lindexi @h3xds1nz does this net9 regression look familiar?
@kasperk81 Emmmm, I am not sure...
As InvokeHandlersImpl
method in WPF, it will pass the Image
object to _traceArguments
in EventRoute when the RouteEvent be invoked, see https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/PresentationCore/System/Windows/EventRoute.cs#L184-L191
That is why the PresentationTraceSources.Refresh
is important. The PresentationTraceSources.Refresh
will open the TraceRoutedEvent.IsEnabled
.
And it do not cause the memory leak, and it will be free after trace. I do not yet know why the behavior difference occurred, I am continuing to investigate...
The issues cause by https://github.com/dotnet/wpf/pull/6700
To reduce allocations when tracing routed events, we'll use a field to store it which cause this issues.
The EventRoute will be re-use, and it means that the _traceArguments
will has the long life time.
Cc @bgrainger
Emmmmm, this issues is not easy to fix. And we can not clean the _traceArguments
in Clear method,because the TraceRoutedEvent can not consumption timely and we ca not know when the tracer consume the arguments Not Sure.
Sorry @bgrainger . Upon further investigation, I've found that your code is not-safe. As previously mentioned, we cannot definitively determine when the tracer will finish consuming the _traceArguments
. This implies that during the reuse of the RoutedEvent, parameter pollution may occur, potentially leading to the tracer recording incorrect parameters. For instance, if I trigger the first RoutedEvent and record the parameters in _traceArguments
and notify the tracer to consume them. Then, I reuse the same RoutedEvent object and update new parameters into _traceArguments
. However, if the tracer hasn't finished consuming the first set of parameters, it means that the tracer will be using the second set of parameters when processing the first set, which is not as expected.
Update:
I think the _traceArguments
can be clear in Clear method or TraceRoutedEvent.Trace
finish. And the TraceSource
will use the arrayList
not the _traceArguments
object. See https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/WindowsBase/MS/Internal/AvTrace.cs#L335-L339
collection expression won't work?
TraceRoutedEvent.Trace(
TraceEventType.Start,
TraceRoutedEvent.InvokeHandlers,
- _traceArguments);
+ [_routeItemList[i].Target, args, BooleanBoxes.Box(args.Handled)]);
Funny, I was looking at that PR/code last week and was thinking exactly that.
Good job @lindexi on the research.
collection expression won't work?
I think this will still allocate. Possible solutions:
Change object[] to a ReadOnlySpan
In the end, the arguments are used in:
which only creates a string. All classes, methods and event delegates are internal, so I think this change can be made?
can this issue be moved to wpf repo ?
Description
GC does not reclaim control memory in following scenario. The control is placed in the visual tree and then removed from it. Conditions must be met:
Reproduction Steps
Run test
TestLeak
in project targeting net9.0-windows. NUnit version is 3.12.0.Expected behavior
The test passes.
Actual behavior
The test fails.
Regression?
This works in net8.0 and net462.
Known Workarounds
No response
Configuration
.NET: 9.0.0-preview.6.24327.6 OS: Windows 10, Version 22H2 (OS Build 19045.4651) Architecture: x64
Other information
No response