apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Does Pinot hybrid table allowed to be created with different schema #11549

Open deemoliu opened 1 year ago

deemoliu commented 1 year ago

Context

We have created a hybrid table succesfully with different schema in realtime and offline part.

{
"OFFLINE": {
"tableName": "rta_temp_test_OFFLINE",
"tableType": "OFFLINE",
"segmentsConfig": {
"schemaName": "**rta_temp_test1**",
...
"REALTIME": {
"tableName": "rta_temp_test_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"schemaName": "**rta_temp_test**",
...
}

However we also find code, which infers realtime and offline table in one hybrid table should have one schema.

https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/TableConfigsRestletResource.java#L451-L464 cc: @ankitsultana

Does Pinot hybrid table allowed to be created with different schema?

Jackie-Jiang commented 1 year ago

No. I think we didn't enforce that before, but no longer allow such configuration going forward

deemoliu commented 1 year ago

thanks @Jackie-Jiang for clarification.

ankitsultana commented 1 year ago

If this is not enforced already can we start enforcing it? @deemoliu you were able to create the hybrid table with different schemas with 0.11?

deemoliu commented 1 year ago

If this is not enforced already can we start enforcing it? @deemoliu you were able to create the hybrid table with different schemas with 0.11?

there is an attached example in the description of this issue.

ankitsultana commented 1 year ago

The question was whether we hit this issue with 0.11 or a later version. Anyways looks like there's a PR already for this #11591

Jackie-Jiang commented 1 year ago

FYI, #11591 will enforce schema name to be the same as raw table config for each table, which also indices that hybrid table must use the same schema (both offline and real-time side have same raw table name).

Why are you still running on 0.11? Are you planning to directly upgrade to 1.0? That is violating the upgrade policy of not jumping release versions

ankitsultana commented 1 year ago

@Jackie-Jiang : we do upgrades incrementally (1 minor version at a time). Some clusters are using 0.11, some are running 0.12. We have quite a large deployment and it's not easy to keep rolling out new versions.