StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.9k stars 1.78k forks source link

Add SQS (and other cloud queue support) for Pipe #50013

Open Samrose-Ahmed opened 2 months ago

Samrose-Ahmed commented 2 months ago

Feature request

Load files directly from cloud object storage queue notifications like Amazon SQS + S3 object notifications (and eventually GCP/Azure/other cloud equivalents).

Is your feature request related to a problem? Please describe.

Load data from upstream sources that use object notifications to queue, quite common nowadays.

Describe the solution you'd like

Consume notifications from the queue and do pipe job as usual instead of listing files.

Describe alternatives you've considered

Currently list files is used but that is inefficient.

Additional context

E.g. for ref Snow pipe supports this.

murphyatwork commented 2 months ago

hey @Samrose-Ahmed, thanks for advice, this enhancement is on the roadmap, but not scheduled right now. if you're willing to contribute we can have a discussion

Samrose-Ahmed commented 2 months ago

Yes, i plan to work on this after a bit, will sync.

Samrose-Ahmed commented 1 month ago

Do you think it makes sense to just poll SQS on FE or need CN? A SQS Poll isnt very heavy. I think current FS list is done via broker.