[API Proposal]: Non-cooperative abortion of code execution

AntonLapounov commented 2 years ago

Background and motivation

Update: Replaced TryThrowException with TryAbort as was in the original proposal.

There is a number of execution environments for .NET code that may benefit from having an option to perform non-cooperative abortion of code execution, e.g., C# and F# interactive compilers (csi and fsi) or .NET Interactive Notebooks (see dotnet/runtime#41291). While the legacy Framework provided the Thread.Abort method, it has never been supported by .NET Core, because aborting code execution at a random point is inherently unsafe and may leave the process in an inconsistent or a corrupted state. Thus, the only available option to interrupt stuck code execution is to terminate the process and lose its state, which is undesirable for interactive scenarios.

Below is the proposed API to allow controlled execution of given code. The Run method starts execution of the delegate and TryAbort method may be used to attempt aborting its execution by throwing an asynchronous ThreadAbortException on that thread. Aborting may be successful only when the code is executing managed code, e.g., running a CPU-intensive loop, on the original thread. This API would not help cases where the thread is stuck in unmanaged code or waiting for some other thread to finish its work.

The implementation will generally use the same mechanism as the one utilized by the debugger to abort threads doing function evaluation (Thread::UserAbort). That includes marking the thread for abortion, suspending it, checking whether it is running managed code and safe to redirect, redirecting the thread to the ThrowControlForThread runtime helper, walking the stack to verify the tread is in a safe spot, and raising the exception for thread abort.

A possible option to make the implementation simpler is to rely only on GC polls / polls for thread abort instead of redirecting the thread on Windows, similarly to what is done on Unixes.

This design was discussed in dotnet/runtime#66480.

API Proposal

namespace System.Runtime.CompilerServices;

public class ControlledExecution
{
    public ControlledExecution(Action action);

    // Runs the specified action.
    // Throws an InvalidOperationException if the action is already running.
    // Throws a ThreadAbortException if execution was aborted.
    public void Run();

    // Attempts to abort the execution of the action by injecting an asynchronous exception.
    // Returns true on success or false otherwise.
    // Throws an InvalidOperationException if the action is not running.
    public bool TryAbort(TimeSpan timeout);
}

API Usage

s_executor = new ControlledExecution(SomeLengthyAction);
s_executor.Run();

// On another thread:
s_executor.TryAbort(TimeSpan.FromMilliseconds(500));

Alternative Designs

No response

Risks

No response

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

teo-tsirpanis commented 2 years ago

Because this API can unpredictably corrupt the application's state by design and to avoid misuse, it might need to be gated by an opt-in runtime configuration setting. This means we also need a static bool IsSupported { get; } property to the ControlledExecution class.

My understanding is that it's impossible to implement this API safely, and this would strongly discourage using it as a replacement of Thread.Abort.

JanEggers commented 2 years ago

whats the differnce of this vs a method that takes a cancellation token?

teo-tsirpanis commented 2 years ago

A method with a cancellation token has to periodically check for it and pass it down the call stack. This API will allow terminating methods forcefully and immediately, even if they don't accept a cancellation token.

stephentoub commented 2 years ago

it might need to be gated by an opt-in runtime configuration setting. This means we also need a static bool IsSupported { get; } property to the ControlledExecution class.

This is a dangerous API, one very few folks should use, but I don't see how the suggestion helps; it just forces extra boilerplate that someone who needs to call this has to go through, and is on the path towards reintroducing Code Access Security. How are the dangers here different from a library, say, P/Invoking to TerminateThread and similarly leaving things in a corrupted state? That doesn't require additional opt-in today. Opting-in is the act of calling the method or using a library that does.

GrabYourPitchforks commented 2 years ago

This API is marked ready for review. Before we approve it, I'd like to get confirmation from the runtime + libraries architects that we're not committing to changing any BCL code to be resilient in the face of asynchronous exceptions being thrown. The original post from Don Syme said that the reliability for this API has to be "good but not perfect" - but I never saw a definition of how we would measure success here.

I really don't want this to be a pretext for reintroducing CERs back into our code after we spent so much effort removing them. See https://github.com/dotnet/docs/issues/29624#issuecomment-1137811322 for some additional discussion on how newer coding constructs (like the array pool) could result in additional pits of failure with regard to asynchronous exceptions.

stephentoub commented 2 years ago

I'd like to get confirmation from the runtime + libraries architects that we're not committing to changing any BCL code to be resilient in the face of asynchronous exceptions being thrown.

We're not going to add back CERs. As far as library code is concerned, thread aborts do not exist.

GrabYourPitchforks commented 2 years ago

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place. They could check exceptions coming from the Run method, and if this Boolean property also returns true, they could normalize it back to an OperationCanceledException (or whatever) for reporting purposes.

In the extreme, the caller might want the answers to the following questions:

Was this operation aborted?
- If so, what was the original exception? (It could have been lost if the delegate threw a different exception internally.)
- What was the call stack of the thread which called TryThrowException?
- What was the call stack of the Run method at the time the exception was thrown?
Was any operation ever aborted? (This would indicate the process is potentially unhealthy, and a REPL service might take action such as spinning up a new service instance in the background.)

jkotas commented 2 years ago

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place.

Is it different from the bool returned by TryThrowException?

GrabYourPitchforks commented 2 years ago

Is it different from the bool returned by TryThrowException?

My understanding is that TryThrowException is invoked by a thread other than the thread which called Run. The "was this operation aborted?" API would be for use by the thread which called Run.

ControlledExecution cex;
try
{
   cex.Run(some_action);
}
catch (Exception ex) when (cex.WasCanceled)
{
    throw new OperationCanceledException(); // normalize
}

jkotas commented 2 years ago

Ah ok, so this would be just for convenience. I am not sure whether it is worth it. The interactive window is going to have wrapper around this that can keep track of what is going on.

GrabYourPitchforks commented 2 years ago

Ah ok, so this would be just for convenience. I am not sure whether it is worth it. The interactive window is going to have wrapper around this that can keep track of what is going on.

Sure. So the suggestion is that since the code which kicks off the work and the code which aborts the work are assumed to have been written by the same author, they should already have some synchronization mechanism between themselves to communicate this information? I'm fine with that.

AntonLapounov commented 2 years ago

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place.

Maybe the Run method should return bool indicating a successful execution and not throw an exception? That would save on try/catch boilerplate code (as in https://github.com/dotnet/runtime/issues/69622#issuecomment-1139831319).

deeprobin commented 2 years ago

Because this API can unpredictably corrupt the application's state by design and to avoid mis

@teo-tsirpanis

How about marking this class unsafe?

teo-tsirpanis commented 2 years ago

@deeprobin it won't work, we need #31354. And it wouldn't solve the problem of a dependency using it. The goal of my proposal is to entirely disable it from any library in the app, like the BinaryFormatter switch.

bartonjs commented 2 years ago

Video

Is System.Runtime.CompilerServices the right namespace?
- Changed to just System.Runtime
We changed the instance ctor/Run()/Abort to a single static using CancellationToken
To discourage people from calling it, we should declare it Obsolete out of the box, with a custom diagnostic ID (and message)
ThreadAbortException should not leak out of Run, it should get normalized to OperationCancelledException.
- Other exceptions can go through without normalization

namespace System.Runtime;

[Obsolete(some diag code)]
public static class ControlledExecution
{
    public static void Run(Action action, CancellationToken cancellationToken);
}

AntonLapounov commented 2 years ago

Implemented in #71661.

dotnet / runtime