containerd / ttrpc-rust

Rust implementation of ttrpc (GRPC for low-memory environments)
Apache License 2.0
195 stars 45 forks source link

Fix hang in wait_all_exit #227

Closed alex-matei closed 4 days ago

alex-matei commented 4 months ago

This patch fixes a hang that can occur in wait_all_exit().

self.wait() without await returns a Future, so the notified() object is not created until we await on the returned future. That means notify_waiters() can be called before notified() is. This leads to notified() waiting forever because notify_waiters is called only once, when the last waiter is dropped.

notify_waiters() and notified() form a happens-before relationship. There are two possible scenarios:

  1. If notified() comes before notify_waiters() this means we can safely await on notified().

  2. If notified() comes after notify_waiters() this means that what happened before it is visible in the notified() thread. Waiting on notified() at this point will block but we can check for waiters count, which is guaranteed to be 0 because it was set before notify_waiters() call.

Let's move notified() call before checking that the number of waiters is 0.

alex-matei commented 4 months ago

@wllenyj hi, any update on this?

alex-matei commented 2 months ago

@Tim-Zhang @lifupan Can you please take a look on this PR? I would like to get this merged and update the version used in kata-containers/kata-agent.