Alternative design for final_suspend that eliminates undefined behaviour of exceptions propagating out of the coroutine internals

lewissbaker commented 6 years ago

The current specification of the coroutines TS (N4680) defines the semantics of the coroutine as follows:

{
  P promise;
  co_await p.initial_suspend();
  try {
    <body>
  } catch (...) {
    p.unhandled_exception();
  }
  co_await p.final_suspend();
}

As described in #17, this definition leads to some potential undefined-behaviour if any of the co_await p.initial_suspend(), co_await p.final_suspend() or p.unhandled_exception() statements throw an unhandled exception.

The current wording states that the coroutine frame is automatically destroyed if the execution of the coroutine runs to completion (ie. hits the closing curly brace above). However, it's unclear what the semantics should be were an exception to propagate out of the coroutine.

What if, instead, we defined the coroutine to have the following semantics:

void run(coroutine_handle<promise_type> h)
{
  promise_type& p = h.promise();
  try {
    co_await p.initial_suspend();
    {
      <body>
    }
final_suspend: // goto target for co_return statement
    <final-suspend-point>
  }
  catch (...) {
    <final-suspend-point>
    p.unhandled_exception();
  }
  if constexpr (std::is_void_v<decltype(p.final_suspend())>) {
    // Could be optionally optimised to a tail-call.
    p.final_suspend();
  } else {
    // Guaranteed tail-call to .resume()
    p.final_suspend().resume();
  }
}

Points of interest here:

(1) There is no more implicit destruction of the coroutine frame when execution runs to completion. The library code/promise_type author must explicitly call coroutine_handle::destroy() to destroy the coroutine. This puts more burden on the library author to remember to call .destroy() in some cases, but is also a much simpler rule than having to consider whether or not the coroutine suspended at the final_suspend-point.

Another potential side-effect of forcing an explicit call to .destroy() on the coroutine handle is that it encourages usage patterns that allow the compiler to more easily deduce that coroutine frame heap allocations can be elided since the lifetime of the coroutine frame is always explicitly ended by a call to .destroy() on its handle.

One concern raised by @GorNishanov was that it could make it more difficult to write fire-and-forget coroutines that destroy themselves on completion, or to write coroutines that complete synchronously like regular functions (eg. like the std::optional monad example from @toby-allsopp).

This should still be fine, however, since we can just make a call to coroutine_handle<promise_type>::from_promise(*this).destroy() within the promise_type::final_suspend() method, or alternatively from within the destructor of an RAII object returned from get_return_object().

(2) The coroutine is now considered suspended at the final suspend point (ie. coroutine_handle::done() returns true) before the p.unhandled_exception() method is called. This allows implementations of p.unhandled_exception() to rethrow the exception and have that exception propagate out of the call to coroutine_handle::resume() while still allowing the caller to subsequently call .destroy() on the coroutine handle even if execution never reaches the call to p.final_suspend().

This greatly simplifies the implementation of generator-like classes and also allows more efficient implementations as the implementation no longer needs to capture the exception as a std::exception_ptr and later rethrow it.

eg.

template<typename T>
struct generator
{
  struct promise_type
  {
    generator<T> get_return_object() { return { *this }; }
    std::experimental::suspend_always initial_suspend() { return{}; }
    std::experimental::suspend_always yield_value(T& value)
    {
      m_value = std::addressof(value);
      return {};
    }
    std::experimental::suspend_always yield_value(T&& value) { return yield_value(value); }
    void return_void() {}
    void unhandled_exception() { throw; }
    void final_suspend() {}
    T* m_value;
  };
  using handle_t = std::experimental::coroutine_handle<promise_type>;
  struct sentinel {};
  struct iterator
  {
    iterator(handle_t h) : m_handle(h) {}
    T* operator->() { return m_handle.promise().m_value; }
    T& operator*() { return *m_handle.promise().m_value; }
    iterator& operator++() { m_handle.resume(); return *this; }
    bool operator==(sentinel) const noexcept { return m_handle.done(); }
    bool operator!=(sentinel s) const noexcept { return !operator==(s); }
  };
  generator(promise_type& p) : m_handle(handle_t::from_promise(p)) {}
  generator(generator&& g) : m_handle(std::exchange(g.m_handle, {})) {}
  ~generator() { if (m_handle) m_handle.destroy(); }
  iterator begin() { m_handle.resume(); return { m_handle }; }
  sentinel end() { return {}; }
}

(3) This potentially allows well-defined behaviour for the case where the unhandled_exception() method is not present on the promise_type. If the unhandled_exception method is not present then we could simply define it as equivalent to having an unhandled_exception() method that contained a single throw; statement.

(4) The call to p.final_suspend() is no longer a co_await expression, but rather just a regular method call. This eliminates the need to define semantics for the case where the await_ready() method returns true (or where await_suspend() returns false or another coroutine_handle) that we would otherwise need to specify if final_suspend() was defined using co_await. It also eliminates the redundant/useless call to await_resume() which will pretty much always just be an empty void-returning method.

The semantics for these cases as defined in N4680 was already a somewhat inconsistent since on the one hand it says that a coroutine suspended at the final_suspend point cannot be resumed, yet on the other hand the intention seems to have been to allow final_suspend awaitables with bool-returning await_suspend methods to return false to indicate that the coroutine should be immediately resumed and execution should continue running to completion and implicitly destroy the coroutine frame.

Technically, this 'immediate resumption' of the coroutine should not be allowed since a coroutine cannot be resumed once suspended at the final suspend point.

(5) The co_await p.initial_suspend() call has been moved inside the try/catch block. This eliminates some potential for undefined behaviour if the initial_suspend statement throws an exception. It would now be well-defined as being handled the same as if a statement in the coroutine body threw an exception rather than being a failure to initialise the coroutine frame.

(6) The pseudo-code used to define the semantics has been changed to avoid having the promise object look like it has the same scope as the call to p.final_suspend(). The promise object would not be destroyed when execution exits the end of the run() function, but rather when coroutine_handle::destroy() is called.

Outstanding questions:

Should the final_suspend() method be passed the coroutine_handle of the current coroutine in the same way that the await_suspend() method is passed the coroutine_handle? This would make it easier to factor certain common logic from promise_type classes out into a base-class

lewissbaker commented 5 years ago

I had another thought about a potential benefit of this approach to handling exceptions and final_suspend. With a few tweaks, it can potentially be used to avoid the task<T> promise needing to capture an exception as an exception_ptr in order to rethrow to the awaiting coroutine.

If we can structure the coroutine body as the following:

void run(coroutine_handle<promise_type> h)
{
  try {
    co_await h.promise().initial_suspend();
    {
      <body>
    }
  }
  catch (...) {
    <final-suspend-point>
    return h.promise().unhandled_exception().resume(); // tail-call resumption?
  }
  return h.promise().done().resume(); // tail-call resumption
}

Here I've changed it so that either promise.unhandled_exception() is called (if the coroutine body exits with an exception) or promise.done() is called (if coroutine body exits normally). If there is any common logic that needs to be handled between the two then that can be easily factored out to a helper function in the promise type.

This would then allow a task promise type to be implemented like the following:

template<typename T>
class task
{
public:
  class promise_type
  {
  public:
    task<T> get_return_object() noexcept
    {
      return task<T>{ coroutine_handle<promise_type>::from_promise(*this) };
    }
    suspend_always initial_suspend() { return {}; }
    coroutine_handle<> unhandled_exception() noexcept { return continuation_; }
    coroutine_handle<> done() noexcept { return continuation_; }
    template<typename U>
    void return_value(U&& value) { result_.emplace(std::forward<U>(value)); }
  private:
    friend class task<T>;
    std::optional<T> result_;
    coroutine_handle<> continuation_;
  };

  task(task&& t) noexcept : coro_(std::exchange(t.coro_, {})) {}

  bool await_ready() noexcept { return false; }

  coroutine_handle<> await_suspend(coroutine_handle<> continuation) noexcept
  {
    coro_.promise().continuation_ = continuation;
    return coro_;
  }

  T await_resume()
  {
    if (!coro_.promise().result_) {
      // We are being resumed from unhandled_exception() code-path which is
      // still inside the catch block and so we can just rethrow the exception without
      // needing to capture as an exception_ptr.
      throw;
    }

    // Otherwise we are being resumed from the done() code-path.
    return std::move(coro_.promise().result_).value();
  }

private:
  friend class promise_type;
  task(coroutine_handle<promise_type> coro) noexcept : coro_(coro) {}
};

This would mean that the awaiting coroutine would be resumed within the nested task's catch block and so it can rethrow the exception without having to capture the exception as an exception_ptr and call std::rethrow_exception() which could be much more efficient than capturing an exception_ptr and rethrowing it at every coroutine boundary.

lewissbaker commented 5 years ago

I've been playing around with some benchmarks to try out resuming within the catch-block for both the throwing and non-throwing cases. See http://quick-bench.com/Hjbg929Ige6iBvfVGyOnSzYkVCM

On Clang, it's about 1.3x faster to resume inside the catch-block and use throw; to rethrow the exception compared with using std::current_exception() and std::rethrow_exception(). On GCC this is reversed, it's about 1.3x slower to use throw; to rethrow.

However, for the non-throwing code-paths it's about 1.3-1.5x faster with the resume-inside-catch-block code-path.

GorNishanov commented 5 years ago

I think we discussed this in September 2018 in person, but, just to double check my recollection. The resume from the catch block will preclude the tail calls. This would make this loop blow up the stack:

    while (some-cond) {
        try { co_await foo(); ++ succeeded-count; }
        catch(...) { ++failed-count; }
     }

If enough operations fail immediately one after the other.

lewissbaker commented 5 years ago

Yes, I believe you are correct. Resuming from within the catch-block could be problematic if the stack was not guaranteed to have unwound stack-frames first before executing the catch-block.

The original suggestion would not have this problem, only the version that was trying to optimise the exception rethrowing using throw; instead of std::rethrow_exception();.

As discussed, optimising the exception-handling paths within a single function may prove more effective than trying to optimise exception-rethrowing between coroutines. eg. allowing the compiler to short-circuit code that looks like this: a) try { ... std::rethrow_exception(e); ... } catch (...) { promise.exception = std::current_exception(); } b) try { ... throw some_exception{}; ... } catch (...) { promise.exception = std::current_exception(); } c) try { ... } catch (...) { throw; }

Optimising a) and b) would help the performance of exception propagation for task and optimising c) will help with generator.

GorNishanov / coroutines-ts

Alternative design for final_suspend that eliminates undefined behaviour of exceptions propagating out of the coroutine internals #30