apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

SegmentDeletionManager assumes segment is directly under table prefix in deep store #14122

Open dd-willgan opened 1 month ago

dd-willgan commented 1 month ago

Hi Pinot team, recently my company came across an issue where we realized that expired segments were not being deleted from the deep store. The reason for this we realized is that Pinot assumes the data is directly under the deep store directory for the given table here but in our case the segments were actually uploaded to subdirectories within the table directory e.g. <dataDir>/<rawTableName>/<partition>/<segment>. Is it possible to try deleting the URI from the segment ZK metadata as a fallback?

Jackie-Jiang commented 1 month ago

Trying to get more context here. Do you use metadata push to upload segments? I think the underlying implication here is that if the data is purposely put in a separate directory, pinot doesn't delete them in case user wants to keep them around. But I guess we may introduce a config for pinot to not delete the file in deep store (by default false)

dd-willgan commented 1 month ago

Hey @Jackie-Jiang yes SegmentMetadataPushJobRunner. I see, yes I would be okay with adding a flag to control this behavior, maybe something like controller.segment.delete.useStoredUri

Jackie-Jiang commented 1 month ago

cc @swaminathanmanish