apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.41k stars 1.27k forks source link

Add support for skipping/defaulting columns during managed offline flow rollup/dedup process #8886

Open bdstuart opened 2 years ago

bdstuart commented 2 years ago

Here is what I said in the pinot troubleshooting channel: If this works as I think it might I could have the best of both worlds maybe. A certain amount of my data is in realtime table w/ the event_id for potential auditing, then as I move to offline table I default event_id to 0 and get good rollup.

To which @Jackie-Jiang repsonded: It is absolutely reasonable. We don't support it currently, but it is doable. Essentially we need to add a new task config to skip some columns when running the task in ROLLUP or DEDUP mode. Internally we will fill default values to these columns so that they won't be considered.

snleee commented 2 years ago

@jtao15 @Jackie-Jiang

Jackie-Jiang commented 2 years ago

For this feature request, another solution is to just not read the value for the skipped columns. The existing transformer can handle the filling of default values. @snleee Do you see other use cases where we want custom transform other than the one in the ingestion config?

npawar commented 2 years ago

@snleee are you planning to pick this up?