apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.45k stars 1.28k forks source link

Better Table config validation #5942

Open icefury71 opened 4 years ago

icefury71 commented 4 years ago

As mentioned in #5915 and #5899 we need a better way to validate table config. This includes following checks (not exhaustive):

Indexing config validation

Retention and pushType

Tenant validation

npawar commented 4 years ago

One more: https://github.com/apache/incubator-pinot/issues/5730

npawar commented 4 years ago

https://github.com/apache/incubator-pinot/issues/5900

icefury71 commented 4 years ago

One more check: Currently retention does not work if pushType is null (for real-time table). Check for a case where pushType is null and retention is configured for a table.

mcvsubbu commented 4 years ago

With all the changes proposed/added, it will be useful if a utility is also provided to validate an existing tableconfig. Many of them are tightening the (lax) rules from before, and are valid. It will be unfortunate if an existing installation suddenly stopped working because code elsewhere (not just in the table addition path) starts assuming things about the tableconfig.

kishoreg commented 4 years ago

+1 to a utility to validate, we can add a validate endpoint in the controller

icefury71 commented 4 years ago

Updated ticket description with all the suggestions.

buchireddy commented 4 years ago

Another use case: Validate that the bloomFilterColumns aren't BYTES type since that's not supported at this point.

cc @kishoreg @icefury71

icefury71 commented 4 years ago

Another use case: Validate that the bloomFilterColumns aren't BYTES type since that's not supported at this point.

cc @kishoreg @icefury71

Looks like this should be supported, skipping validation for this particular thing

buchireddy commented 4 years ago

Another use case: Validate that the bloomFilterColumns aren't BYTES type since that's not supported at this point. cc @kishoreg @icefury71

Looks like this should be supported, skipping validation for this particular thing

@icefury71 Do you mean bloomFilterColumns supports BYTES columns? From which version/

icefury71 commented 4 years ago

What I meant is - technically it should be supported (it's not today and we should probably fix that).

On Tue, Sep 29, 2020 at 12:31 AM Buchi Reddy Busi Reddy < notifications@github.com> wrote:

Another use case: Validate that the bloomFilterColumns aren't BYTES type since that's not supported at this point. cc @kishoreg https://github.com/kishoreg @icefury71 https://github.com/icefury71

Looks like this should be supported, skipping validation for this particular thing

@icefury71 https://github.com/icefury71 Do you mean bloomFilterColumns supports BYTES columns? From which version/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-pinot/issues/5942#issuecomment-700507663, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFRXV5JMUFTZ7U66D3KOFTSIGEN7ANCNFSM4QOTJTBA .

-- Thanks and regards

Chinmay Soman

icefury71 commented 4 years ago

+1 to a utility to validate, we can add a validate endpoint in the controller

@kishoreg @mcvsubbu Seems like we already have a /tables/validate end-point in the Table API. This should be sufficient ? (Same thing with Schemas : /schemas/validate)

mcvsubbu commented 4 years ago

yes, that works as long as the same validations are performed. thanks

icefury71 commented 4 years ago

yes, that works as long as the same validations are performed. thanks

Yes, its calling the same validation function underneath.

Jackie-Jiang commented 4 years ago

We can also add checks on partition config where the table can be partitioned on at most one column

yupeng9 commented 4 years ago

+1

Another use case is the upcoming upsert. There are several expectations for upsert table like primary key in schema, replica group setup, routing by replicas group, partition config, ingestion via LLC etc