StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.25k stars 1.67k forks source link

Support for apache arrow flight SQL #22944

Open kesavkolla opened 1 year ago

kesavkolla commented 1 year ago

Currently the only protocol starrocks supporting is MySQL. It is great for interacting for row oriented data. If we want to use for large data transfers and zero copy semantics it is good to support arrow as an alternative. Arrow flight server or adbc will help in getting data efficiently and faster from starrocks.

imay commented 1 year ago

That's an excellent suggestion! This is on our roadmap, but it will take some time to finish.

rupurt commented 9 months ago

Would love to have this!

c-thiel commented 8 months ago

+1 for this! Aside from faster transfer speeds on a connection to a single FE, flight even allows tickets to be served from multiple enpoints / Servers. This eliminates the inevitable bottleneck of the network card of a single server. Starrocks has a MPP architecture. It would be great to Server Clients MPP as well! Plus of course the benefit of skipping the deserialization on client side for any of the arrow supported languages.

v-kessler commented 8 months ago

Is here maybe a timeline for this feature?

YuriyGavrilov commented 6 months ago

+1

Just to put it here some additional context:

There is new modern SQL way to retrieve data in a faster way through Arrow Flight SQL driver. https://arrow.apache.org/docs/java/flight_sql_jdbc_driver.html# https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/ https://www.postgresql.org/about/news/apache-arrow-flight-sql-adapter-for-postgresql-010-2716/

Some explanations and research: https://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf

Some server examples: https://github.com/voltrondata/flight-sql-server-example https://www.kamu.dev/blog/2023-09-datafusion-flightsql/ https://github.com/kamu-data/kamu-cli (Flight SQL working with Tableau in Postrges dialect)

macroguo-ghy commented 3 months ago

any progress?

derekperkins commented 2 weeks ago

This would be incredibly useful for efficient data analysis

derekperkins commented 2 weeks ago

Our use case is for streaming Apache Arrow data to the web browser to use the DuckDB WASM (used with mosaic) for real-time analytics dashboards. As of now, we can do that querying from either BigQuery or DuckDB, but not StarRocks.

We could get the data from StarRocks and convert it ourselves in app code on the server before sending it to the client, but it would be awesome if that could just be passthrough.