PlatformLab / Homa

Low-Latency Data Center Network Transport
ISC License
192 stars 27 forks source link

Simplify core 2 #13

Open yilongli opened 3 years ago

yilongli commented 3 years ago

Made some more improvements atop https://github.com/PlatformLab/Homa/pull/11.

yilongli commented 3 years ago

The clean up, especially of the interfaces, looks pretty good; I don't have any issues there. My main concern is with the locking.

Overall, it seems like this PR reduces the size/scope of the bucket and queue/scheduler mutex critical sections in most places. Unfortunately, doing this seems to have introduced potential race conditions and TOCTOU bugs. Perhaps I missed some critical factors that rule out these issues, but the locking did not make it obvious that the code is race-free. I would argue that using a locking style more like monitors with the larger critical sections makes it easier to convince ourselves that the logic is thread-safe. What was the motivation for using fine-grain critical sections? Can we achieve your goals and still have obviously thread-safe code? Let's find a time to discuss.

There are three advantages with the fine-grain critical section approach. First, the code is more explicit about what need to be protected. Second, we can flip the locking order constraint to be queueMutex before bucket mutex and make the code in trySend() simpler. Third, it becomes easier to change the code to make sure no locks are held when calling into the driver. Obviously, I missed the point that the original code was also using the bucket mutex to achieve safe memory reclamation. I decided not to try the reference counting approach so I have now reverted to the original design (at least for Sender, the problems in Receiver seems rather easy to fix).