apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.2k stars 1.21k forks source link

Table truncate command #5559

Open npawar opened 3 years ago

npawar commented 3 years ago

Add truncate operation i.e. delete all contents of table, without removing table config and schema.

For offline, this simply means deleting all segments (everything a delete operation does, but keep the table/schema definition) For realtime, this means deleting all segments, both completed and consuming. Then, we should recreate the ideal state, just like we would've for a new realtime table.

Jackie-Jiang commented 3 years ago

@npawar What's the difference between this API and remove all segments API? I can see the truncate API also disables the table, but I don't think it should do so.

npawar commented 3 years ago

@Jackie-Jiang there's no difference. It's just packaging the Pinot operations into a command that users are familiar with (database terminology).

Yes, the table should not be permanently disabled. I noted disable so that we stop sending queries. I forgot to write re-enable after deleting all segments.

Jackie-Jiang commented 3 years ago

@npawar IMO we don't need to disable then re-enable the table. If the user sends queries while truncating the table, the result is expected to be indeterministic. If we disable the table, no Helix transition message will be sent, and all the transitions will happen when the table gets re-enabled. The behavior will be the same as not disabling the table.

Jackie-Jiang commented 3 years ago

Actually I'm not sure about the Helix behavior when you modify the IdealState of a disabled resource

mcvsubbu commented 3 years ago

We may need to do this carefully. Need to consider race conditions like:

Due to these reasons I prefer that we don't complicate pinot admin APIs on the controller. Instead, provide an admin command that can do more complex operations using the primitives in there, and taking care of other conditions and correcting for them.

kishoreg commented 3 years ago

@mcvsubbu Pinot admin commands are simply wrappers on pinot controller admin API's right.

We can disable a table first and then truncate it. This is a useful command and let's not worry too much about all the corner cases. when someone is doing a truncate, they don't really care about the data.

mcvsubbu commented 3 years ago

@kishoreg Maybe you forget, :) but I know how hard it was to detect (let alone debug) the phenomena of two consuming segments for the same partition in the same replica.

Yes, this is a corner case, but when an issue is reported, we need to debug it. If it was hard when we had all the machines and logs at hand and could look at zookpeer logs at will, imagine doing it remotely when some open source user reports a problem when we can't even GUESS that there are parallel consuming segments.

We added plenty of comments and put the logic in one class so that we don't stumble on such things again.

https://github.com/apache/incubator-pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L665

https://github.com/apache/incubator-pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L715

https://github.com/apache/incubator-pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L754

We make a strong assumption here that there are only two threads in the entire cluster that create segments. By starting a new validation manager, we violate that assumption.

kishoreg commented 3 years ago

whats the fix?

mcvsubbu commented 3 years ago

whats the fix?

Good question. I don't have an answer at the moment. As a reviewer, the best I can do (and sometimes it is not possible either) to look at problems it may bring about.

Keeping the truncate facility as a command line argument will help, because we will NOT be running validation manager, etc. We can check things all we want, report error to the user, user can run it again, etc. Why can't we do that ?

Providing primitive APIs, and command lines on top of that is a good practice for most things, especially this one.

mcvsubbu commented 3 years ago

whats the fix?

Good question. I don't have an answer at the moment. As a reviewer, the best I can do (and sometimes it is not possible either) to look at problems it may bring about.

Keeping the truncate facility as a command line argument will help, because we will NOT be running validation manager, etc. We can check things all we want, report error to the user, user can run it again, etc. Why can't we do that ?

Providing primitive APIs, and command lines on top of that is a good practice for most things, especially this one.

One way to alleviate such a problem will be to scan the ideal state for duplicate segments in the same partition/sequence pair. If there is an attempt to add this, then throw a permanent exception and not allow addition of the segment. This will, at least help us a bit in debugging, if we get controller logs and see this exception.

This self-checking code will appear to be a patch against a bug that we don't know about. I don't like it, but this is an alternative if we are having multiple threads (across controller hosts) update the idealstate like it is being suggested here. It is also not clear what the ideal state updater should do when this exception is encountered. Ideally, I would like some corrective action taken instead of just moving on, but we can think about it. At the minimum, set a metric and we can alert on the metric.

Another solution is to approach this the right (but complex) way of going through a znode that allows exactly one thread on any one machine to update the idealstate of a table. Then we can freely start validation manager whenever we feel like it, on whichever controller receives some complex request like this.

But I go back to my original question. Why provide more than primitives as API, and let the administrative comands (or GUI or whatever) do multiple operations to achieve what the end user wants? Advantages:

  1. There is a clean channel to return the error to the user, and have the user retry the operation or abort it, or whatever.
  2. We don't have to worry about controller restarting between the three sub-operations we are trying to perform. IMO that is a sound working architecture.
kishoreg commented 3 years ago

How is this truncate different from drop table.

my proposal is

    • disable the table
    • wait for all segments to go to offline state
    • delete all the segments
    • enable the table
    • call the setup table method

We already do the first 3 steps for delete/drop table.

mcvsubbu commented 3 years ago

@kishoreg what happens if the controller fails over to another one in between any of these steps?

kishoreg commented 3 years ago

Client gets an error response and tries it again. All the steps are idempotent.

This is same as any other call that involves multiple steps right?

For e.g addTable can fail before setting up real-time table consumption.

Sreemanth commented 3 years ago

@kishoreg Step 4 - Enable the table

Done

Step 5- call the setup table method

Can you give some idea or share some resource about this step?

kishoreg commented 3 years ago

@npawar can you point him to setup table code or the validation manager.

npawar commented 3 years ago

@Sreemanth take a look at PinotHelixResourceManager#addTable. In that method, if you trace path for realtime, you will see the call to ensureRealtimeClusterIsSetUp. That call will setup the realtime table again. Please ping me if you have any questions

Sreemanth commented 3 years ago

@npawar Ensure table setup logic added.

mcvsubbu commented 3 years ago

Please add me to the reviewer list when u have a PR.

And if possible, do not submit one huge PR. It will enable timely (and better) reviews.

mcvsubbu commented 3 years ago

As i see it, there are multiple parts in this that can go independently. You can add these before exposing the final API. But then definitely get the API started in a design document so that we can comment on the API. It is a good practice not to debate on API in PRs.

If it is not a design doc, we can debate on the API here in the issue.

diogobaeder commented 2 years ago

Hi folks! Are there any news on this feature? I'm starting to use Pinot with a system I develop, based on Python, and it's been quite a struggle for me to be able to work on integration tests because I haven't yet found an easy and reliable way to start each test from a clean slate. Having a "truncate table" functionality would be immensely useful for my project!