Tencent / flare

Flare是广泛投产于腾讯广告后台的现代化C++开发框架,包含了基础库、RPC、各种客户端等。主要特点为易用性强、长尾延迟低。
Other
1.33k stars 200 forks source link

Question about looping for read_events_ and write_events_ #56

Closed npuichigo closed 3 years ago

npuichigo commented 3 years ago

Could anyone help to explain why to use a loop for read_events_ here.

if (read_events_.fetch_add(1, std::memory_order_acquire) == 0) {
      RefPtr self_ref(ref_ptr, this);

      do {
        auto rc = OnReadable();
        if (FLARE_LIKELY(rc == EventAction::Ready)) {
          continue;
        } else if (FLARE_UNLIKELY(rc == EventAction::Leaving)) {
          FLARE_CHECK(read_mostly_.seldomly_used->cleanup_reason.load(
                          std::memory_order_relaxed) != CleanupReason::None,
                      "Did you forget to call `Kill()`?");
          GetEventLoop()->AddTask([this] {
            read_events_.store(0, std::memory_order_relaxed);
            QueueCleanupCallbackCheck();
          });
          break;
        } else if (rc == EventAction::Suppress) {
          SuppressReadAndClearReadEventCount();
          break;
        }
      } while (read_events_.fetch_sub(1, std::memory_order_release) !=
               1);
      QueueCleanupCallbackCheck();
    });

We can only get read_events_ == 1 after the if condition, so is there any special case that looping to decrease read_events_ is necessary?

0x804d8000 commented 3 years ago

You oversimplified the code.

Line 176 starts a fiber to run OnReadable asynchronously. This is done to parallelize handling of I/O events. However, this also introduces a side effect that read_events_ can be increased concurrently, after OnReadable draining the read buffer, but before Line 222 is reached.

npuichigo commented 3 years ago

Since a fiber is created and detached after increasing read_events_ from 0 to 1 and L176 is the only line to increase read_events_, could u point out which line may cause concurrent increase of read_events_? Do u mean OnReadable in a fiber may also start parallel invocation to this function?

0x804d8000 commented 3 years ago

Note that epoll event can be triggered any time soon after the fd is drained (edge triggered), even before OnReadable has an opportunity to return.

IOW, there is a time window between returning EAGAIN from syscall read (inside OnReadable), and decrementing read_events_. If new data arrived in this time window, without checking event count we may lose the notification.

Consider this:

F1: Received EPOLLIN, starts F2 to handle it, F2: Done reading data, is either calling user’s callback or about to return, and got paused (due to whatever reason, e.g., interrupted by system), F1: Another EPOLLIN comes, now either:

Yes you’re right the only place that can increment the event count is L176, but it’s exactly that line that helps us to avoid missing the notification.

npuichigo commented 3 years ago

@0x804d8000 Thanks for your detailed explanation. Learn a lot after reading the implementations.