dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.19k stars 4.72k forks source link

Proposal: New API of "move semantics" for IDisposable #1163

Open mingxwa opened 4 years ago

mingxwa commented 4 years ago

History

Update on Jan 9, 2020

Add readonly qualifier to HasValue and Value in the struct Movable<TResource> (C# 8.0 specific).

Update on Dec 29, 2019

  1. Add conversions between Movable<TResource> and corresponding TResource;
  2. Remove the static class Movable;
  3. Change the class Movable<TResource> into struct to eliminate leak during abortive heap-allocation and for performance considerations.

Legacy Posts

[CLOSED] dotnet/csharplang #3052 [CLOSED] dotnet/runtime #1150

Motivation

Currently, the "using" keyword guarantees exception-safety in a specific context by disposing the associated resources even if an exception is thrown in the block. However, RAII is not always applicable when using resources across contexts. For example:

Resource resource = new Resource();   // "Resource" is a class that implements the interface "IDisposable"
Configure(resource, currentContext);  // "Configure" is a method to configure the "resource"; "currentContext" represents a capture of runtime context
return resource;

The code constructs an instance of Resource, performs some operations and returns the instance. Since we don't want to dispose the resource immediately after the function returns, using is not applicable in these scenarios. The code will work fine if every function call returns normally. However, when an exception is thrown during configuration (the second line), the "resource" leaks. This issue does not exist in C++ because of move semantics built on top of value semantics.

With the proposed facilities, the code above could be refactored into:

using Movable<Resource> resource = new Resource();
Configure(resource.Value, currentContext);
return resource.Move();

The mechanism is similar to the "move semantics" in C++ to ensure exception-safety but has more strict preconditions to avoid abuse. More details are illustrated in the following content.

Considerations

In the previous revision, I have tried to solve this problem with a utility method (more details could be found in dotnet/csharplang #3052):

public static TResource UseWithoutOwning<TResource>(TResource resource, Action<TResource> usingAction) where TResource : IDisposable;

Preferring Library Feature

According to the feedback, @mikedn and @scalablecory has suggested making "move semantics" a language feature. However, as @YairHalberstadt pointed out:

I think it extremely unlikely that a special syntax for your current proposed library method would be added. A new language feature is orders of magnitude more expensive than new library features, so if there was doubt whether it's common enough to belong in the standard library, it almost definitely does not belong in the language.

Instead you would need to come up with a more wide ranging design for adding full support for move semantics to C#.

I would much prefer to persue it as a pure library feature since it is apparently implementable, unless there is a sence of strong consensus in this group to make it a language feature.

Avoid "Callback Hell"

Some feedback has pointed out that the previous revision does not applicable to the cases where multiple instances of resource are involved. For example (from @svick 's feedback):

class C : IDisposable
{
    readonly Resource1 resource1;
    readonly Resource2 resource2;

    C()
    {
        resource1 = CreateResource1();
        resource2 = CreateResource2();
        MethodThatCouldThrow();
    }

    /// IDisposable implementation, etc.
}

It is NOT true. In the code above, the constructor C() could be refactored with UseWithoutOwning into:

UseWithoutOwning(
    CreateResource1(),
    resource1 => UseWithoutOwning(
        CreateResource2(),
        resource2 =>
        {
            MethodThatCouldThrow();
            this.resource1 = resource1;
            this.resource2 = resource2;
        }
    });

But anyway, I do think the code look much complicated comparing to the non-exception-safety version. I think it would be easier to use if we are able to avoid such "callback hell" code.

Single VS Bulk

@HaloFour has suggested a helper class to increase usability. The following code is copied from his feedback:

using (var builder = new DisposableBuilder())
{
    var resource1 = builder.Using(new Resource1());
    var resource2 = builder.Using(new Resource2());
    Configure(resource1, resource2);

    return builder.Build(() => new WrappingResource(resource1, resource2));
}

I was impressed by his design that totally avoided callback hell. However, I do not think it is always a good practice to manage all the resources in a same registry (DisposableBuilder), because:

  1. It is possible for some user to consume (including Dispose) resources in the same context where they are created, resulting in redundant Dispose operation. For example, if the constructor of Resource2 in the code above consumes the value of resource1, it is not necessary to dispose resource twice. Although the behaviour is well-defined, I do not think it reasonable for a component in the standard library to introduce such redundant operation.
  2. It becomes harder to reason about the ownership of resources if we are not familiar with the pattern. Therefore, this facility needs additional code guidelines to avoid abuse.
  3. If builder.Using() throws when expanding its space to store the reference, the behaviour is unclear to me.

Thinking more about @HaloFour 's solution, I think it could be simplified to concentrate on single resource management, resulting in less complexity, more usability and reliability. With the proposed facilities, the code above could be refactored into:

using Movable<Resource1> resource1 = new Resource1();
using Movable<Resource2> resource2 = new Resource2();
Configure(resource1.Value, resource2.Value);
return new WrappingResource(resource1.Move(), resource2.Move());

I think it is cleaner than the implementation with DisposableBuilder.

Technical Specifications

This section includes the proposed wording. Please let me know if there is anything missing.

Movable Struct

Namespace: System

public struct Movable<TResource> : IDisposable where TResource : class, IDisposable

A wrapper class making resources movable in an RAII (using) context.

Type Parameters TResource The underlying type of the associated resource.

Constructor

Initializes a new instance of the Movable<TResource> structure with the specific resource.

public Movable(TResource resource);

Parameters resource TResource The resource held by the new instance.

Throws ArgumentNullException if resource is null.

Properties

HasValue

Gets a value indicating whether this is associated with a valid resource.

public readonly bool HasValue { get; }

Value

Get the resource associated with this.

public readonly TResource Value { get; }

Throws InvalidOperationException if this is not associated with a valid resource.

Methods

Move

Move the resource out of this, and this will no longer associate with a valid resource during its lifetime.

public TResource Move();

Returns The resource associated with this.

Throws InvalidOperationException if this is not associated with a valid resource.

Dispose

If this is associated with an underlying resource, call Dispose() on it. this will no longer associate with a valid resource during its lifetime.

public void Dispose();

ToString

// Omitted

Operators

public static implicit operator Movable<TResource>(TResource resource);
public static explicit operator TResource(Movable<TResource> movable);

Static conversions between Movable<TResource> and corresponding TResource, similar to Nullable<T>.

Implementation

Here is the sample implementation for the proposed facility (exposition only):

namespace System
{
    public struct Movable<TResource> : IDisposable where TResource : class, IDisposable
    {
        private TResource resource;

        public Movable(TResource resource)
        {
            this.resource = resource ?? throw new ArgumentNullException(nameof(resource));
        }

        public readonly bool HasValue => this.resource != null;

        public readonly TResource Value => this.resource ?? throw new InvalidOperationException("The resource has been moved before, and could no longer be accessed.");

        public static implicit operator Movable<TResource>(TResource resource) => new Movable<TResource>(resource);

        public static explicit operator TResource(Movable<TResource> movable) => movable.Value;

        public TResource Move()
        {
            TResource result = this.resource ?? throw new InvalidOperationException("The resource has been moved before. Please pay attention to resource leak if there is any unmanaged one in the context.");
            this.resource = null;
            return result;
        }

        public void Dispose()
        {
            if (this.resource != null)
            {
                this.resource.Dispose();
                this.resource = null;
            }
        }

        public override string ToString() => $"{this.GetType().Name}[{this.resource}]";
    }
}
AaronRobinsonMSFT commented 4 years ago

/cc @terrajobst

stephentoub commented 3 years ago

The proposal is to create a type that lets you write:

using Movable<T> m = new Movable<T>(new T());
Use(m.Value);
return m.Move();

instead of:

var t = new T();
try
{
    Use(t);
}
catch
{
    t.Dispose();
    throw;
} 
return t;

so as to basically create a using that only disposes in the case of exception and not in success? This pattern comes up, but in my experience not nearly with a frequency that would warrant such a type with this name in the core libraries, especially since it’s frequently not as clean as this. For example, this is a reasonably common example: https://github.com/dotnet/runtime/blob/69f260eff523bc16c8a95c232e4816b3a1764e88/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs#L1276-L1327 Here, we only create the resource sometimes, and we don't want to let the raw exception propagate and instead translate it into something else, which necessitates a try/catch anyway. Eliding some of the code, it's basically:

Socket socket = null;
try
{
    if (something)
    {
        ...
        return ...
    }
    else
    {
        socket = new Socket(...);
        ...
        return new NetworkStream(socket);
    }
}
catch (Exception e)
{
    socket?.Dispose();
    throw TranslateException(e);
}

which with this proposal would become:

using Movable<Socket> m = default;
try
{
    if (something)
    {
        ...
        return ...
    }
    else
    {
        m.Assign(new Socket(...)); // this isn't in the proposal, but without it you couldn't use using in any meaningful way
        ...
        return new NetworkStream(m.Move());
    }
}
catch (Exception e)
{
    throw TranslateException(e);
}

To me, that's no more easier to understand nor maintain, plus it requires an extra level of try/catch.

On top of all that, it's trivially easy to accidentally use .Value when you actually meant to use .Move(), in which case you're going to be disposing of things you didn't intend to dispose of.

And potentially most impactfully, there's nothing here that really helps with ownership and transferring/borrowing of ownership, which would really need to be achieved at a language level. (And it's only usable with reference types, potentially as a nod to that?)

So, my $.02, as proposed this is not something I'd want to see added to the core libraries. If folks think it's useful in particular situations, I'd recommend creating a nuget package with it, or finding an existing open source library that publishes a nuget package, and add it there (or just copy the code into a project that needs it); if it really built up a meaningful following, potentially it could be reconsidered at that time.

mingxwa commented 3 years ago

so as to basically create a using that only disposes in the case of exception and not in success?

This proposal is intended to enhance RAII and avoid resource leak in C#. RAII in C# today is NOT complete because using is not cross context, and any code manages resources cross contexts has ambiguous ownership and is NOT exception-safe by definition.

This pattern comes up, but in my experience not nearly with a frequency that would warrant such a type with this name in the core libraries, especially since it’s frequently not as clean as this.

The issue does not seem to be frequent because in many cases people won't easily realize there is an exception-safety bug in their code, which is even more obscure than data race. In C++, RAII is more comprehensive where "move semantics" is a part of the language.

For example, this is a reasonably common example: https://github.com/dotnet/runtime/blob/69f260eff523bc16c8a95c232e4816b3a1764e88/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs#L1276-L1327

Here, we only create the resource sometimes, and we don't want to let the raw exception propagate and instead translate it into something else, which necessitates a try/catch anyway. Eliding some of the code, it's basically:

Socket socket = null;
try
{
    if (something)
    {
        ...
        return ...
    }
    else
    {
        socket = new Socket(...);
        ...
        return new NetworkStream(socket);
    }
}
catch (Exception e)
{
    socket?.Dispose();
    throw TranslateException(e);
}

which with this proposal would become:

using Movable<Socket> m = default;
try
{
    if (something)
    {
        ...
        return ...
    }
    else
    {
        m.Assign(new Socket(...)); // this isn't in the proposal, but without it you couldn't use using in any meaningful way
        ...
        return new NetworkStream(m.Move());
    }
}
catch (Exception e)
{
    throw TranslateException(e);
}

This is not the correct usage of this facility. If the scope of the "Socket" is just in the "else" block, we no longer need to make it "global". That I did not design "Assign" is because the declaration of "Movable" is only expected in a "using" expression with valid resources.

try
{
    if (something)
    {
        ...
        return ...
    }
    else
    {
        using Movable<Socket> m = new Socket(...);  // Ownership on the stack
        ...                                         // Ownership on the stack
        return new NetworkStream(m.Move());         // Ownership moved out of the stack
    }
}
catch (Exception e)
{
    throw TranslateException(e);
}

To me, that's no more easier to understand nor maintain,

People also tend to say this when "move" was added in C++11 10 years ago, but it turned to be a useful feature.

plus it requires an extra level of try/catch.

I did not get it.

On top of all that, it's trivially easy to accidentally use .Value when you actually meant to use .Move(), in which case you're going to be disposing of things you didn't intend to dispose of.

In C++, you can always use an rvalue reference (Movable here) as an lvalue reference (.Value) without the ownership. It is a user's (developer) responsibility to define the ownership of any resource at any time to avoid resource leak. "Movable" just provides a standard way to manage ownership cross context, otherwise one need to do extra things like socket?.Dispose() in your example and even a try/finally block to deal with this. Comparing to all these extra code, I think ".Move()" and ".Value" is much easier.

And potentially most impactfully, there's nothing here that really helps with ownership and transferring/borrowing of ownership

It helps users to transfer ownership of a resource from the stack to the caller (as return value) or other objects (including this). The ownership transfers on ".Move()" as a transaction.

which would really need to be achieved at a language level. (And it's only usable with reference types, potentially as a nod to that?)

Since it could be implemented at a library level, I do not think it necessary to be a new language feature.

So, my $.02, as proposed this is not something I'd want to see added to the core libraries.

Maybe you can try it first before trashing it. Microsoft Exchange has designed similar facility many years ago and it was widely used in the codebase.

(BTW, what is "$.02"?)

If folks think it's useful in particular situations, I'd recommend creating a nuget package with it, or finding an existing open source library that publishes a nuget package, and add it there (or just copy the code into a project that needs it); if it really built up a meaningful following, potentially it could be reconsidered at that time.

Any feedback would be appreciated. So far I think it should stay close with "IDisposable", which belongs to namespace System. Since it should be a part of the RAII implementation of the language, I also think it should be added in the official coding guidelines.

sharwell commented 3 years ago

❗ The example code has a serious bug.

using Movable<Resource> resource = new Resource();
return resource.Move();

The above code relies on Move() being a mutable method that assigns a value to a field of a structure (a flag indicating the resource has moved out). However, since the language defines the value of a using statement as a read-only location, the Move() call operates on a copy of resource, which means the original is unaltered and the resource is always disposed.

Under the current rules of the language, Movable<T> cannot appear in a using statement, but it does work with a try/catch block. The downside of this is it means Movable<T> offers almost no real-world benefit over what can be done without it. (This is the issue that prevented me from implementing an experimental type quite similar to this proposal a while back.)

stephentoub commented 3 years ago

Maybe you can try it first before trashing it.

I did try it (meaning I tried writing some code with it before responding), and I provided my honest feedback about why I don't believe it has a place in the core libraries; I'm sorry if that came across as "trashing" it. It's great to hear you've had success with it in your libraries, and you can obviously continue doing so. Without additional language support, though, this feels to me right now as a non-starter.

mingxwa commented 3 years ago

❗ The example code has a serious bug.

using Movable<Resource> resource = new Resource();
return resource.Move();

The above code relies on Move() being a mutable method that assigns a value to a field of a structure (a flag indicating the resource has moved out). However, since the language defines the value of a using statement as a read-only location, the Move() call operates on a copy of resource, which means the original is unaltered and the resource is always disposed.

I did not find any related section in the using statement page, and could not reproduce the bug: https://dotnetfiddle.net/S5bUA4

When will this happen? Could you provide any related document?

stephentoub commented 3 years ago

BTW, what is "$.02"?

https://en.wikipedia.org/wiki/My_two_cents

stephentoub commented 3 years ago

the Move() call operates on a copy of resource

@sharwell, are you sure? I see this:

        IL_0000: newobj instance void Resource::.ctor()
        IL_0005: call valuetype Movable`1<!0> valuetype Movable`1<class Resource>::op_Implicit(!0)
        IL_000a: stloc.0
        .try
        {
            IL_000b: ldloca.s 0
            IL_000d: call instance !0 valuetype Movable`1<class Resource>::Move()
            IL_0012: pop
            IL_0013: leave.s IL_0023
        } // end .try
        finally
        {
            // sequence point: hidden
            IL_0015: ldloca.s 0
            IL_0017: constrained. valuetype Movable`1<class Resource>
            IL_001d: callvirt instance void [System.Private.CoreLib]System.IDisposable::Dispose()
            IL_0022: endfinally
        } // end handler

That ldloca is loading the address of the original stored in location 0, the same one that's later disposed.

mingxwa commented 3 years ago

I did try it (meaning I tried writing some code with it before responding), and I provided my honest feedback about why I don't believe it has a place in the core libraries; I'm sorry if that came across as "trashing" it. It's great to hear you've had success with it in your libraries, and you can obviously continue doing so. Without additional language support, though, this feels to me right now as a non-starter.

I also considered to add it elsewhere in a Microsoft internal NuGet package since there would be less effort on my side, but it turned out the NuGet package only has one type, and it is actually a fix to using and IDisposable... I just could not think of a more appropriate place than the core library, and I also want to help others avoid such ownership hell in C#. I have seen enough incidents caused by resource leak in our services.

sharwell commented 3 years ago

@stephentoub You're right. I ran into multiple problems with this type, but I mixed it up here. Visual Basic is the language that causes the problem I described. For C#, the problem I ran into was an inability to initialize the holder via an out parameter, which is reasonably common in unmanaged interop scenarios.

@wmx16835 I've actually attempted to use a resource "holder" like you describe in a moderate sized code base. I ended up convincing myself that even if the semantics could me made to work, the instructions for using the type across a broad range of cases would not be simple enough to justify a new pattern. If it was easy to use the type and always know it was going to work for addressing stack-ownership scenarios (temporary or otherwise) involving disposable resources, then perhaps I would be more in favor of it. In practice I found that it did not achieve that goal.

mingxwa commented 3 years ago

@sharwell

For C#, the problem I ran into was an inability to initialize the holder via an out parameter, which is reasonably common in unmanaged interop scenarios.

Not only for out parameters, even the return types shall not be a "holder". It is the responsibility on the user side to manage the lifetime of a resource, rather than the library side.

the instructions for using the type across a broad range of cases would not be simple enough to justify a new pattern.

I think it is pretty simple to use, just add a "Movable<>" in your using statement whenever you want to transfer the ownership of a resource on the stack to an external place. There will be no side effect if the ownership is not transferred at runtime.

If it was easy to use the type and always know it was going to work for addressing stack-ownership scenarios (temporary or otherwise) involving disposable resources, then perhaps I would be more in favor of it.

I am not the first one invents the "move" semantics. It has been in C++ for 10 years and we rarely see ownership hell in C++ projects where std::move() and std::forward() is properly used, unless the std::shared_ptr is abused.

sharwell commented 3 years ago

The issue I have isn't with the attempt to implement move semantics, but rather the subtleties of the type under the current language constraints result in an implementation with "holes".

One thing to keep in mind is there is no significant functional benefit to defining this in the runtime, at least in terms of being able to use it. This type is designed for use on the stack and not in signatures, which means you can declare it today as internal and start using it immediately.

mingxwa commented 3 years ago

@sharwell

The issue I have isn't with the attempt to implement move semantics, but rather the subtleties of the type under the current language constraints result in an implementation with "holes".

I see. I think further discussion is needed to ensure no such "holes" are introduced if we think it is the right direction to go. At least I did not find any gap in my daily use of this facility.

One thing to keep in mind is there is no significant functional benefit to defining this in the runtime, at least in terms of being able to use it.

It is defined at compile-time but actually works at runtime because exception is a runtime thing.

This type is designed for use on the stack and not in signatures, which means you can declare it today as internal and start using it immediately.

These should be restricted in the official coding guidelines and maybe in the stylecop as well.

FrankHB commented 2 years ago

The issue does not seem to be frequent because in many cases people won't easily realize there is an exception-safety bug in their code, which is even more obscure than data race. In C++, RAII is more comprehensive where "move semantics" is a part of the language.

This is not quite true. In C++, with or without move (pre-C++11), RAII is always "complete". RAII can work because of the rules of the destructors of the C++. It needs quite low-level consideration (actually, being part of the definition of the term object lifetime) to enforce the global (runtime) invariant: every object being destroyed will finally call its destructor (if any) to make sure the effects of the destructor call will never be skipped by any well-defined means except the termination of the program. (This also includes the context of the control switches during an exception being thrown, where C++ requires stack unwinding, with the effects of necessary destructor calls.) Users can then enforce visible side effects in the body of the destructors to balance the resource creations for every sane objects, including any resource owners.

This is the whole story how RAII (plus RRID) works in C++. Move semantics has contribute nothing more; OTOH, the move-after states actually make the problems more complicated. In contrast, move semantics solves another problem here: mixture of copy and move. Consider C++'s auto_ptr, which is quite error-prone because it moves the owned resource in the copy constructor when users are not meaning to move. This is superseded by unique_ptr which does correct things to differentiate move from the copy by the move constructor and move assignment operator in the copy initialization of the object. This is not the case here because C# has no such copy initialization (by value) by default.

(The only one extra change concerned is the implicit overloading rules on move constructors, which effectively allows the omission of the explicit syntax "std::move(...)"equivalent to .Move() here, but this is not applicable in C#.)

That said, C# is surely defective here. Strictly speaking, C# is just not capable of RAII, in its original meaning. You cannot fix it without more core language magics to lift IDisposable as fundamental as the destructor in C++ (not in C#, which is technically "finalizer"; see ECMA-334 for the clarification of the term).