apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Pinot batch ingestion duplicate row issue #13984

Closed rahuldeshmukh81 closed 1 month ago

rahuldeshmukh81 commented 1 month ago

We are using pinot batch ingestion where we are receiving duplicate data in multiple files. Eg file1 has 5 record and few of them we Received in next file along with new data

We have tried primaryKeyColumns but it is inserting duplicate

Jackie-Jiang commented 1 month ago

Trying to understand the problem. Is the record duplicated in the input files? Do you want Pinot to help deduplicate the records? cc @swaminathanmanish

rahuldeshmukh81 commented 1 month ago

Yes we have duplicate record in the file. And other use case is, we already persisted the record And Same record we received in the file along with new Data , here expectations is to only persist new one