StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
The HTTP SQL API is currently optimized well for short running queries but doesn't work well for longer queries which should be run asynchronously and polled and not need to keep an open HTTP connection to wait for results. As an example Snowflake and Databricks e.g. both handle this well. It would essentially involving adding an option to execute a query asynchronously and then endpoints to retrieve the status of a query execution, and to retrieve results of a query execution. I also think, especially in shared data mode, we should stream the results to object storage, this will allow retrieving the query results (and probably also reduce memory pressure on CN). We are adding this to our local starrocks but would be love to collaborate/contribute.
Enhancement
The HTTP SQL API is currently optimized well for short running queries but doesn't work well for longer queries which should be run asynchronously and polled and not need to keep an open HTTP connection to wait for results. As an example Snowflake and Databricks e.g. both handle this well. It would essentially involving adding an option to execute a query asynchronously and then endpoints to retrieve the status of a query execution, and to retrieve results of a query execution. I also think, especially in shared data mode, we should stream the results to object storage, this will allow retrieving the query results (and probably also reduce memory pressure on CN). We are adding this to our local starrocks but would be love to collaborate/contribute.
I am starting a draft Google Doc to discuss, please leave comments: https://docs.google.com/document/d/1_PGTcjnSwtgHOA5i2uWnXeHerRYlPH23IXT6z2XJPFk/edit?usp=sharing