Enable consistentDataPush for Spark execution framework

apache / pinot

Apache Pinot - A realtime distributed OLAP datastore

https://pinot.apache.org/

Apache License 2.0

5.15k stars 1.2k forks source link

Enable consistentDataPush for Spark execution framework #12941

Open lrao-stripe opened 2 weeks ago

lrao-stripe commented 2 weeks ago

https://github.com/apache/pinot/pull/9295 enabled consistent data push for standalone execution framework. This would be a great feature to extend to Spark based ingestion as well.

This will be useful for scenarios for our users where every run of a batch job may produce a different number of partition files and an atomic replace of one set of segments with another will help mitigate the issue of serving duplicate data.

swaminathanmanish commented 2 weeks ago

@Jackie-Jiang - Please assign this to me.