dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.05k stars 4.04k forks source link

IEnumerator<T> codegen does clear the "current" field on disposal #52391

Closed oatkins closed 3 years ago

oatkins commented 3 years ago

An enumerable created with the Select<TInput, TOutput>(IEnumerable source, Func<TInput, int, TOutput> selector) method, once enumerated, leaks a reference to the last item in the source enumerable. The corresponding Select<TInput, TOutput>(IEnumerable source, Func<TInput, TOutput> selector) does not leak.

This seems to be because the former relies on a compiler-generated enumerator, which is less eager about releasing references than the standard Select() operator.

// Fails!
[Test]
public void SelectWithIndexIteratorShouldNotRetainReferenceToLastItem()
{
    static (IEnumerable<object> Enumerable, WeakReference First, WeakReference Second) Action()
    {
        var first = new object();
        var second = new object();
        var list = new List<object> { first, second };

        // Note the specific overload of Select() used here.
        var enumerable = list.Select((x, ix) => x);

        // The enumerable must be enumerated fully once for the bug to be manifest.
        foreach (var _ in enumerable)
        {
        }

        list.Clear();

        return (enumerable, new WeakReference(first), new WeakReference(second));
    }

    var (enumerable, first, second) = Action();

    GC.Collect(2, GCCollectionMode.Forced, true, true);

    first.IsAlive.ShouldBeFalse();  // Pass.
    second.IsAlive.ShouldBeFalse(); // Fail.

    GC.KeepAlive(enumerable);
}

// Passes.
[Test]
public void SelectIteratorShouldNotRetainReferenceToLastItem()
{
    static (IEnumerable<object> Enumerable, WeakReference First, WeakReference Second) Action()
    {
        var first = new object();
        var second = new object();
        var list = new List<object> { first, second };

        var enumerable = list.Select(x => x);

        foreach (var _ in enumerable)
        {
        }

        list.Clear();

        return (enumerable, new WeakReference(first), new WeakReference(second));
    }

    var (enumerable, first, second) = Action();

    GC.Collect(2, GCCollectionMode.Forced, true, true);

    first.IsAlive.ShouldBeFalse();
    second.IsAlive.ShouldBeFalse();

    GC.KeepAlive(enumerable);
}
ghost commented 3 years ago

Tagging subscribers to this area: @eiriktsarpalis See info in area-owners.md if you want to be subscribed.

Issue Details
An enumerable created with the [Select(IEnumerable source, Func selector)](https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.select?view=net-5.0#System_Linq_Enumerable_Select__2_System_Collections_Generic_IEnumerable___0__System_Func___0_System_Int32___1__) method, once enumerated, leaks a reference to the last item in the source enumerable. The corresponding [Select(IEnumerable source, Func selector)](https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.select?view=net-5.0#System_Linq_Enumerable_Select__2_System_Collections_Generic_IEnumerable___0__System_Func___0___1__) does not leak. This seems to be because the former relies on a compiler-generated [enumerator](https://github.com/dotnet/runtime/blob/8445e4eaa3b18bfdbed1eb1507293777c590d371/src/libraries/System.Linq/src/System/Linq/Select.cs#L79), which is less eager about releasing references than the standard `Select()` operator. ```csharp // Fails! [Test] public void SelectWithIndexIteratorShouldNotRetainReferenceToLastItem() { static (IEnumerable Enumerable, WeakReference First, WeakReference Second) Action() { var first = new object(); var second = new object(); var list = new List { first, second }; // Note the specific overload of Select() used here. var enumerable = list.Select((x, ix) => x); // The enumerable must be enumerated fully once for the bug to be manifest. foreach (var _ in enumerable) { } list.Clear(); return (enumerable, new WeakReference(first), new WeakReference(second)); } var (enumerable, first, second) = Action(); GC.Collect(2, GCCollectionMode.Forced, true, true); first.IsAlive.ShouldBeFalse(); // Pass. second.IsAlive.ShouldBeFalse(); // Fail. GC.KeepAlive(enumerable); } // Passes. [Test] public void SelectIteratorShouldNotRetainReferenceToLastItem() { static (IEnumerable Enumerable, WeakReference First, WeakReference Second) Action() { var first = new object(); var second = new object(); var list = new List { first, second }; var enumerable = list.Select(x => x); foreach (var _ in enumerable) { } list.Clear(); return (enumerable, new WeakReference(first), new WeakReference(second)); } var (enumerable, first, second) = Action(); GC.Collect(2, GCCollectionMode.Forced, true, true); first.IsAlive.ShouldBeFalse(); second.IsAlive.ShouldBeFalse(); GC.KeepAlive(enumerable); } ```
Author: oatkins
Assignees: -
Labels: `area-System.Linq`, `untriaged`
Milestone: -
oatkins commented 3 years ago

Interestingly, this seems to be because the implementation uses an iterator method (i.e. yield return). The problem can be reproduced using even simpler code:

[Test]
public void SimpleIteratorShouldNotRetainReferenceToLastItem()
{
    static (IEnumerable<object> Enumerable, WeakReference First, WeakReference Second) Action()
    {
        var first = new object();
        var second = new object();
        var list = new List<object> { first, second };

        var enumerable = Build(list);

        foreach (var _ in enumerable)
        {
        }

        list.Clear();

        return (enumerable, new WeakReference(first), new WeakReference(second));
    }

    var (enumerable, first, second) = Action();

    GC.Collect(2, GCCollectionMode.Forced, true, true);

    first.IsAlive.ShouldBeFalse();
    second.IsAlive.ShouldBeFalse();

    GC.KeepAlive(enumerable);

    static IEnumerable<T> Build<T>(IEnumerable<T> source)
    {
        foreach (var s in source)
        {
            yield return s;
        }
    }
}

So it could be argued that the code gen for an iterator method is wrong. (It's not a recent change. I could repro this in VS2015.) In which case, this is probably the wrong place for the bug report.

Or maybe my expectation that you can't leak through an enumerable is wrong. That is unexpected. Half the point of an iterator method is that the returned IEnumerable<> is lazily evaluated—so I fully expect to get different results each time I enumerate over it. It seems logical, then, that it wouldn't hold on to results from a previous enumeration; but maybe that's a wrong inference.

eiriktsarpalis commented 3 years ago

I think it's an issue with iterator codegen. Both iterator methods and a few LINQ implementations apply an optimization where the same allocation can be used for both IEnumerable and IEnumerator. However it looks like unlike the LINQ implementation iterator methods don't clear the "current" field on disposal

cc @jaredpar

CyrusNajmabadi commented 3 years ago

I think it's an issue with iterator codegen.

This sounds like a feature request. I don't really see anything that mandates taht a disposed iterator must let go of all it's internal state. I imagine it's fully abiding by thte spec to not do so, and the recommendatino woudl be: if you don't want the enumerable holding onto anything, don't hold onto the enumerable itself.

A change here would likely come with a perf impact as this would mean all these iterators would regen and would now do work where tehy weren't before.

Given that this has likely been the behavior since we first introduced interators (c# 2.0, right? so 15 years ago?), i don't see this as being something that is actually a problem or that should be changed now after all this time.

jaredpar commented 3 years ago

Agree with @CyrusNajmabadi analysis here.

Further there is nothing in the enumeration contract that specifies Current should be invalidated after calling Dispose. The documentation actually suggests otherwise. Consider that the docs for Current don't mention Dispose as one of the cases where it's value is undefined. Further the documentation for Dispose only mentions freeing resources, not clearing Current.

If there are no additional resources to dispose of, provide an empty Dispose() implementation.

Note that none of the samples on the IEnumerator<T> page do clear Current on Dispose.

I think there is a good argument that we should be more aggressively clearing out Current in the case MoveNext returns false. If we were implementing iterators today I think we'd likely push for doing that as it's inline with other decisions we've made here.

At this point though the most important factor here though is compatibility. This is how iterators have worked since their inception (as far as my testing shows). Changing this behavior now would likely be a significant back compat change. It's reasonable to expect there is a non-trivial amount of code out there which, deliberately or not, accesses Current after Dispose or after MoveNext returns false and depends on the value being present. As mentioned if the concern is holding onto state then the code can adjust by releasing the enumerator objects.

oatkins commented 3 years ago

Thanks for the replies. It makes a lot of sense that changing the behaviour of the code generated for an iterator method wouldn't be a great idea at this stage!

I do have one question/observation still. My original report mentioned that the two overloads of Enumerable.Select() differ as to whether they hold references to the last value of Current after the complete enumeration of the sequence. The simpler overload is quite heavily optimised with code to return a SelectListEnumerator, SelectArrayEnumerator, etc. These do not seem to retain the unwanted reference (after Dispose()). However, the overload of Enumerable.Select() with a selector that takes an integer index adds this behaviour using an iterator method, which, as discussed, does retain the reference to Current.

I would suspect that by far the most common use case for an iterator method is to return an IEnumerable<> that is consumed and then discarded immediately, in which case there is no problem with the current implementation. However, it seems much more likely that other patterns of using IEnumerable<> with LINQ constructs might involve retaining an enumerable for longer. In my case, a Select() projection inside some reactive extensions code resulted in a hard-to-find memory leak in a production application.

Would it be reasonable to suggest that the indexed overload of Select() be reimplemented using a custom iterator that behaves more like the simpler version; or would that also likely be considered too high risk a change?

CyrusNajmabadi commented 3 years ago

Would it be reasonable to suggest that the indexed overload of Select() be reimplemented using a custom iterator that behaves more like the simpler version; or would that also likely be considered too high risk a change?

It's a totally reasonable to suggest such a thing. Whether or not it would be accepted is out of scope for the csharplang repo :). If you'd like, i can reopen this issue and move it back to the Runtime team. They could make a decision if they wanted to make the behavior between these two Select methods more consistent. IMO, it has a non-zero chance of being taken given that the runtime team has changed the behavior of the Linq implementation over the years, and this doesn't feel too 'out there' in terms of the change.

That said, they may also have hte same opinion that this isn't a significant enough issue, and that changing it may have too high a risk. It will ultimately be their assessment. LMK if you do want me to transfer this back.

Thanks!

oatkins commented 3 years ago

Yes please, that would be great.


From: CyrusNajmabadi @.> Sent: Sunday, April 4, 2021 5:35:21 PM To: dotnet/roslyn @.> Cc: Olly Atkins @.>; Author @.> Subject: Re: [dotnet/roslyn] IEnumerator codegen does clear the "current" field on disposal (#52391)

Would it be reasonable to suggest that the indexed overload of Select() be reimplemented using a custom iterator that behaves more like the simpler version; or would that also likely be considered too high risk a change?

It's a totally reasonable to suggest such a thing. Whether or not it would be accepted is out of scope for the csharplang repo. If you'd like, i can reopen this issue and move it back to the Runtime team. They could make a decision if they wanted to make the behavior between these two Select methods more consistent. IMO, it has a non-zero chance of being taken given that the runtime team has changed the behavior of the Linq implementation over the years, and this doesn't feel too 'out there' in terms of the change.

That said, they may also have hte same opinion that this isn't a significant enough issue, and that changing it may have too high a risk. It will ultimately be their assessment. LMK if you do want me to transfer this back.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Froslyn%2Fissues%2F52391%23issuecomment-813102909&data=04%7C01%7C%7Ca1269b2535f947e31f1c08d8f7b18d81%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637531689239936834%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AATBTumzjgVyIL9t2IGn4K%2B0NgEsHvgLzxgwSj%2F8CZA%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACTLN2A2BPBXE4QGKI3UVGDTHDLRTANCNFSM42J7GTJQ&data=04%7C01%7C%7Ca1269b2535f947e31f1c08d8f7b18d81%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637531689239946816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WDGQP1t6Y8iuDlgD6TS6ijNgujimfux42Pzhy906kr4%3D&reserved=0.

eiriktsarpalis commented 3 years ago

From a libraries perspective it's unlikely we'd make such a change, since really this impacts all LINQ implementations using iterator methods and we have quite a few of those.

That said, they may also have hte same opinion that this isn't a significant enough issue, and that changing it may have too high a risk. It will ultimately be their assessment.

I would agree that this isn't significant enough to warrant the risk of rewriting a bunch of methods.

CyrusNajmabadi commented 3 years ago

Thanks @eiriktsarpalis ! Given that assessment, I'm going to leave this closed. Both the runtime and compiler feel the current situation is acceptable, and changes here are too risky to warrant making changes here.

Thanks all!