Open ianvkoeppe opened 4 years ago
Here is my suggested approach.
In this case, there is no new API, but for use cases that have multiple segments, the time boundary will be advanced when the first segment is pushed. This will lead to inconsistency.
Pinot currently has the inconsistency problem only if RT is not there, but we will be introducing a new one now.
We can, however, introduce an API as well, which can send out a message to the brokers to advnce time bondary. This API will be invoked by the push job when all segments are pushed. This iwll reduce the inconsistency window to some narrow cases such as: (1) Three segments are being pushed. After the first one, a broker restarts, and lokoing at the table config, and the recent timestamp, advances the boundary. Since the other two segments are not pushed yet, the broker has no way of knowing that a message is going to come by later. Corner case, but maybe we can leave that for the day when we support "atomic" push of data. (2). Three segments are being pushed. The push complets, but the external view is not yet updated (slow server, slow helix, whatever). The broker meanwhile gets the message to move time boundary.
The second one is more likely to happen, but either one is a bad experience for the application.
Some mitigation may be to include the segment names in the controller api and the controller waits for the EV of these segments to be ONLINE before issuing a message, but that is also not complete. Broker may still not see the new EV
Ideas?
@mcvsubbu the time boundary should be saved somewhere in ZK so that all brokers use it and it handles all the failure and restarts scenarios.
The user still has to wait for the servers to load the segment before invoking the time boundary.
@kishoreg I didn't follow. We don't store the time boundary now. Why do we need to keep it in ZK? Perhaps you are trying to support an arbitrary time boundary that a user can set. I don't see a use case for that now. My thought is that we can support one of two time boundaries -- either include the latest <day/hour> or exclude it. I think this will cover most of the use cases.
What is the use case to cover up to some arbitrary user-specified time?
the latest day will clearly not work unless there is one segment for that day and it is too limiting.
The use case I am thinking is the same one listed here. If we allow this to be part of ZK, then time boundary is max(max(segments) - 1 , value in ZK)
Eventually, we want to avoid users having to manage offline jobs. Users are already asking for features where we provide a minion task that moves realtime segments to offline nodes automatically after a day. Having this API will allow us to control the time boundary.
Summarizing the offline discussion between @mcvsubbu @snleee @sajjad-moradi and @mayankshriv:
Let us avoid combining this with #5712 . That was replacing segments, and this is about adjusting time boundary after segments are pushed. A single primitive to adjust the time boundary, invoked by the segment pusher should be enough
I also thought some more about znode and watches. Introducing a new node may be fine, but having the broker set a watch on that needs some thought. I remember in the HLC case, the controlller was setting too many watches, and that caused zookpeer re-connects to fail. A re-connect requires that the zk client send the entire watch list in one message to the zk server. We had to increase the message size setting in order to overcome this...
It seems discussing here may not be as effective as desired. My proposal is to have a design doc within initial proposal detailed out. We can use that as a starting point to discuss the pros/cons. @sajjad-moradi @snleee is one of you still planning to start that design doc?
@mayankshriv sure, I can start the initial design doc.
Here is the design doc proposing three different solutions for this problem: https://docs.google.com/document/d/1KykbUIstFI7F4IOOlUVnXWzpgKazwRAJdok0RT1ae64 Please review and leave your comments.
Here is the design doc proposing three different solutions for this problem: https://docs.google.com/document/d/1KykbUIstFI7F4IOOlUVnXWzpgKazwRAJdok0RT1ae64 Please review and leave your comments.
Thanks, added to ticket.
@ianvkoeppe The agreement on design doc is to go with approach 1 as the short term solution. Since this approach is very similar to what you provided in PR #5745, could you please reopen the PR and apply the new changes. Basically the boolean flag that you have defined in TableConfig need to be changed to a number indicating how many days (or hours) need to be deducted from the time boundary. Default value should be one for backward compatibility. Follow PR #4156 to make the changes on time boundary calculation.
@sajjad-moradi we would prefer to discuss table config items in a design doc. If you can update the design doc with the proposed config, then @ianvkoeppe can go for it. thanks.
@sajjad-moradi we would prefer to discuss table config items in a design doc. If you can update the design doc with the proposed config, then @ianvkoeppe can go for it. thanks.
The doc is updated with config suggestion. Please review and leave your comments.
Thanks all! I appreciate the rapid iteration on the design. @sajjad-moradi I'm happy to iterate on that PR and make necessary changes.
@ianvkoeppe Just checking if you are still planning on iterating your PR as per the new design?
@mayankshriv, I would like to still get this done, but it's not something I can prioritize right now.
Overview
For hybrid setups, Pinot splits/filters broker queries to both the offline and realtime tables. It does so based on a
time value
which represents the latest available offline segments. Pinot assumes the consumer is uploading partial or intermediate segments throughout the current day, so it only servestime value - 1
from the offline table and the rest from realtime.As an explicit example, if I upload a segment with the date 1/2/2000, the
time value
for the table will be 1/1/2000. This means, a query for all time, will be modified to query (, 1/1/2000] to the offline table and (1/1/2000, ) to the realtime table. In this scenario, the 1/2/2000 segment is not being served queries. Once another segment is uploaded for 1/3/2000, then the data from 1/2/2000 will be served.Problem Statement
In hybrid table setups where offline segments are only uploaded once per day, it is desired that those offline segments immediately start serving over the realtime segments. Today, this has to be achieved via a "hack" where an empty segment is pushed for a future date to trick Pinot into serving the latest offline segment with actual data.
Requirements
Design Doc
https://docs.google.com/document/d/1KykbUIstFI7F4IOOlUVnXWzpgKazwRAJdok0RT1ae64