cplusplus / sender-receiver

Issues list for P2300
Apache License 2.0
20 stars 3 forks source link

`when_all`/`split`: operation state might get prematurely destroyed when child completes synchronously inside its stop callback #300

Open jiixyj opened 15 hours ago

jiixyj commented 15 hours ago

I may have stumbled across a nasty lifetime issue in the handling of stop callbacks in the when_all/split algorithms. But this would apply to any algorithm using a inplace_stop_source inside its operation state.

when_all has a inplace_stop_source inside its operation state, and then a stop callback, like this:

struct when_all_opstate {
    // ...
    inplace_stop_source stop_src{};
    // ...
    optional<stop_callback> on_stop{nullopt};
    // ...
};

In on_stop, a stop callback is registered, which calls stop_src.request_stop() when there is a stop request on the receiver's stop token. All child senders of the when_all are registered on the stop_src. This propagates the stop request from the receiver to all children of the when_all sender.

Now, what I observed is the following chain of events:

I've managed to "hack around" it by doing some checking of thread id's and deferring the completion to the stop callback if I detect that the completion is called synchronously from inside a stop callback. So something like this:

template <typename Opstate>
struct on_stop_request_with_thread_id {
public:
    void operator()() const noexcept
    {
        id_->store(std::this_thread::get_id(), std::memory_order::relaxed);
        op_->request_stop();
        if (id_->load(std::memory_order::relaxed) == std::thread::id{}) {
            op_->deferred_complete();
            return;
        }
        id_->store(std::thread::id{}, std::memory_order::relaxed);
    }

public: // NOLINT enable aggregate initialization
    Opstate* op_;
    std::atomic<std::thread::id>* id_;
};

export template <typename Opstate, typename Token>
struct stop_callback_with_thread_id {
public:
    bool should_defer_completion()
    {
        if (id_.load(std::memory_order::relaxed) != std::this_thread::get_id()) {
            return false;
        }

        id_.store(std::thread::id{}, std::memory_order::relaxed);
        return true;
    }

    void reset() { on_stop_.reset(); }

    void emplace(Token token, Opstate* op)
    {
        on_stop_.emplace(std::move(token), on_stop_request_with_thread_id<Opstate>{op, &id_});
    }

private:
    using stop_callback_t = ex::stop_callback_for_t<Token, on_stop_request_with_thread_id<Opstate>>;

    std::atomic<std::thread::id> id_{};
    std::optional<stop_callback_t> on_stop_ = {};
};

...and then using stop_callback_with_thread_id instead of optional<stop_callback> inside when_all's opstate, and having a complete() like this:

            void complete(Rcvr& rcvr) noexcept
            {
                // If we are completing synchronously from inside a stop
                // callback, defer completion.
                if (on_stop.should_defer_completion()) {
                    this->deferred_rcvr = &rcvr;
                    return;
                }

                complete_impl(rcvr);
            }

This all feels very hacky to me, though.

I haven't deeply investigated split yet, but I think the solution could be a bit simpler there, by using a stop callback like this instead of on-stop-request:

template <typename SharedState>
struct split_on_stop_request {
    void operator()() const noexcept
    {
        shared_state_->inc_ref();
        shared_state_->stop_src.request_stop();
        shared_state_->dec_ref();
    }

    SharedState* shared_state_;
};

...i.e. just wrapping the request_stop() between inc_ref/dec_ref to ensure the opstate object stays alive long enough.

I do wonder if there is a more elegant way to solve this issue. I don't think synchronous completions from stop callbacks should be outlawed -- it seems "natural" to me to do the set_stopped right inside the stop callback if possible. Or maybe synchronous destruction from inside the set_stopped completion of when_all is the problem? I've thought that you have to assume the lifetime of the opstate may end when calling the completion, though.

dietmarkuehl commented 14 hours ago

The scenario seems plausible. I think a nice way to work around the early destruction could be to increment the number of expected completions before iterating: the iteration over the children is conceptually an outstanding task. Once that is done the count is decremented and the appropriate completion is triggered if all outstanding work is completed.

lewissbaker commented 14 hours ago

This seems eerily similar to an issue reported in libunifex some time back: https://github.com/facebookexperimental/libunifex/issues/445

dietmarkuehl commented 13 hours ago

The standard doesn't have that problem [yet?]: it sets up a callback using on-stop-request (see [exec.when.all p12]) which isn't defined/described. However, only when_all's stop source is passed in, ie., there is no option to play tricks with the count.

jiixyj commented 13 hours ago

The scenario seems plausible. I think a nice way to work around the early destruction could be to increment the number of expected completions before iterating: the iteration over the children is conceptually an outstanding task. Once that is done the count is decremented and the appropriate completion is triggered if all outstanding work is completed.

This is an excellent suggestion, thanks! Now, looking at libunifex's implementation, this is how they fixed it as well.

This seems eerily similar to an issue reported in libunifex some time back: facebookexperimental/libunifex#445

Yep, seems to be the exact same issue. Thanks for the pointer! In the comments they mention stop_when as well, which is interesting, because this is where I originally stumbled across this (trying to implement the exposition-only stop_when sender algorithm from P3149R6). I was implementing it in a naive way, without stashing the original sender's result in a result_variant in the opstate. But I guess there's no way around it because of this lifetime issue.

Reading P3409R0 earlier, I had hopes that maybe single_inplace_stop_source could fix it, because it wouldn't need to loop around the list of stop callbacks as there is just a single one. But it needs to do some book-keeping after calling the stop callback, so this is sadly not a correctness fix:

  inline bool single_inplace_stop_source::request_stop() noexcept {
...
      callback->execute(callback);

      state_.store(stop_requested_callback_done_state(), memory_order_release); // <<< this might access the opstate after its lifetime ended
      state_.notify_one();
    }

    return true;
  }
jiixyj commented 13 hours ago

The standard doesn't have that problem [yet?]: it sets up a callback using on-stop-request (see [exec.when.all p12]) which isn't defined/described. However, only when_all's stop source is passed in, ie., there is no option to play tricks with the count.

It is mentioned in [exec.snd.expos]: https://eel.is/c++draft/exec#snd.expos-16

dietmarkuehl commented 12 hours ago

It is mentioned in [exec.snd.expos]: https://eel.is/c++draft/exec#snd.expos-16

I didn't look there! Thanks for pointing that out. The implication is, of course, that the standard does have the problem. It may be reasonable to factor out the counting behavior and the stop callback handling into a separate entity used in relevant places: on-stop-request is also used for split (and other similar algorithms would use the same approach).