apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.31k stars 1.24k forks source link

S3 segments in Deleted_Segments do not get retentioned #10956

Open dang-stripe opened 1 year ago

dang-stripe commented 1 year ago

We've noticed segments in S3 are not getting deleted from the Deleted_Segments directory. This seems to be because when SegmentDeletionManager calls S3PinotFS.listFiles, it only returns files and not directories and always returns an empty list, causing the removeAgedDeletedSegments call to terminate. The Deleted_Segments directory only contains a directory for each table w/ deleted segments.

We found this log line that confirms this which reports 0 files being returned.

[2023-06-21 17:02:32.807244] INFO [S3PinotFS] [pool-17-thread-1:169] Listed 0 files from URI: s3://some-bucket/pinot/pinot-cluster-1/prod-1/Deleted_Segments, is recursive: false

https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java#L290

https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java#L486-L499

Jackie-Jiang commented 1 year ago

@snleee Can you help take a look?