apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.53k stars 1.29k forks source link

Zookeeper Transaction #9302

Open sajjad-moradi opened 2 years ago

sajjad-moradi commented 2 years ago

There are places in the code base in which multiple write interactions with ZK is done as a part of one operation. If any of these interactions fails or if the pinot component fails between these ZK interactions, then we'll be in an inconsistent state. For example we have this situation in segment commit end when a consuming segment gets committed. To clean up the issues from the mentioned failures, we have set up a periodic task (segment validation manager job), and periodically look for these inconsistencies and try to fix them.

A better approach is to use the ZK Transaction API to prevent having these inconsistencies in the first place. At the beginning of the operation, we can create a ZK transaction object and then use the transaction object to interact with ZK by:

When ZK operations are done, then we commit all of them at once. If commit is successful, then all ZK operations have successfully completed, otherwise none will be applied.

By briefly looking at Helix API's, it looks like Helix doesn't expose ZK transaction API's. Until Helix provides the transaction API's, I think we should directly use Zookeeper client to leverage transaction capabilities which, in turns, reduces the chances of facing the mentioned failure cases. It'll also help simplifying the code base to handle these edge cases which is getting more complicated by adding new features.

mcvsubbu commented 2 years ago

+1 on using transaction APIs. We can also see if Helix intends to support it (we have moved to Helix 1.x, so hopefully it will be easier extension).