Improve SQL HTTP API for asynchronous execution

Samrose-Ahmed commented 6 days ago

Enhancement

The HTTP SQL API is currently optimized well for short running queries but doesn't work well for longer queries which should be run asynchronously and polled and not need to keep an open HTTP connection to wait for results. As an example Snowflake and Databricks e.g. both handle this well. It would essentially involving adding an option to execute a query asynchronously and then endpoints to retrieve the status of a query execution, and to retrieve results of a query execution. I also think, especially in shared data mode, we should stream the results to object storage, this will allow retrieving the query results (and probably also reduce memory pressure on CN). We are adding this to our local starrocks but would be love to collaborate/contribute.

I am starting a draft Google Doc to discuss, please leave comments: https://docs.google.com/document/d/1_PGTcjnSwtgHOA5i2uWnXeHerRYlPH23IXT6z2XJPFk/edit?usp=sharing

kevincai commented 6 days ago

@Samrose-Ahmed thanks for writing up the proposal, would you please open the commenter permission so we can have feedback directly on the doc.

Samrose-Ahmed commented 6 days ago

Apologies, I've opened up the doc.

StarRocks / starrocks

Improve SQL HTTP API for asynchronous execution #47691

Enhancement