StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.65k stars 1.75k forks source link

Support iceberg table compaction #48736

Open rohankrao opened 1 month ago

rohankrao commented 1 month ago

Starrocks supports writing into iceberg tables. It will be nice if starrocks supports iceberg table housekeeping.

Feature request

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alvin-celerdata commented 1 month ago

@rohankrao Thanks for this suggestion, I wonder whether or not you have put starrocks over Iceberg in production?

rohankrao commented 1 month ago

No, not in production. This feature will help in production.

On Tue, 23 Jul, 2024, 2:45 am alvin, @.***> wrote:

@rohankrao https://github.com/rohankrao Thanks for this suggestion, I wonder whether or not you have put starrocks over Iceberg in production?

— Reply to this email directly, view it on GitHub https://github.com/StarRocks/starrocks/issues/48736#issuecomment-2243826958, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXIE2BD3EUTCSTPZO4KMRTZNVY6TAVCNFSM6AAAAABLIX3GOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTHAZDMOJVHA . You are receiving this because you were mentioned.Message ID: @.***>

Dshadowzh commented 1 month ago

@rohankrao Which do you want, a fully managed compaction service or just a compaction interface you can integrated into your schedule system?

rohankrao commented 1 month ago

I will prefer a manual compaction command like you have for native tables. I am writing into iceberg from kafka externally and can trigger compaction when needed. Instead of using spark to do housekeeping, I want to use SR for that.

nqvuong1998 commented 1 week ago

It would be beneficial if StarRocks supported Iceberg table maintenance features such as optimization, expiring snapshots, and removing orphan files.