StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.92k stars 1.79k forks source link

Support for apache arrow flight SQL #22944

Open kesavkolla opened 1 year ago

kesavkolla commented 1 year ago

Currently the only protocol starrocks supporting is MySQL. It is great for interacting for row oriented data. If we want to use for large data transfers and zero copy semantics it is good to support arrow as an alternative. Arrow flight server or adbc will help in getting data efficiently and faster from starrocks.

imay commented 1 year ago

That's an excellent suggestion! This is on our roadmap, but it will take some time to finish.

rupurt commented 1 year ago

Would love to have this!

c-thiel commented 1 year ago

+1 for this! Aside from faster transfer speeds on a connection to a single FE, flight even allows tickets to be served from multiple enpoints / Servers. This eliminates the inevitable bottleneck of the network card of a single server. Starrocks has a MPP architecture. It would be great to Server Clients MPP as well! Plus of course the benefit of skipping the deserialization on client side for any of the arrow supported languages.

v-kessler commented 1 year ago

Is here maybe a timeline for this feature?

YuriyGavrilov commented 11 months ago

+1

Just to put it here some additional context:

There is new modern SQL way to retrieve data in a faster way through Arrow Flight SQL driver. https://arrow.apache.org/docs/java/flight_sql_jdbc_driver.html# https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/ https://www.postgresql.org/about/news/apache-arrow-flight-sql-adapter-for-postgresql-010-2716/

Some explanations and research: https://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf

Some server examples: https://github.com/voltrondata/flight-sql-server-example https://www.kamu.dev/blog/2023-09-datafusion-flightsql/ https://github.com/kamu-data/kamu-cli (Flight SQL working with Tableau in Postrges dialect)

macroguo-ghy commented 7 months ago

any progress?

derekperkins commented 4 months ago

This would be incredibly useful for efficient data analysis

derekperkins commented 4 months ago

Our use case is for streaming Apache Arrow data to the web browser to use the DuckDB WASM (used with mosaic) for real-time analytics dashboards. As of now, we can do that querying from either BigQuery or DuckDB, but not StarRocks.

We could get the data from StarRocks and convert it ourselves in app code on the server before sending it to the client, but it would be awesome if that could just be passthrough.