Open costin opened 1 year ago
Pinging @elastic/es-ql (Team:QL)
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)
Congrats on creating issue #100000 😄
Pinging @elastic/es-analytics-geo (Team:Analytics)
A note for later - now that we have support for arrow
it's streaming format could "naturally" support, well, streaming results instead of pagination. Not the same. And we'd have to flip a few things around to make streaming result sets work, but that's something we could do too.
Now that we have support for streaming HTTP responses in general I'd suggest considering using it for all[^1] response formats, not just arrow
. It is so much simpler for clients than having to issue multiple requests to use pagination, and it should be easier to implement efficiently too.
[^1]: Maybe not text/plain
since computing column widths ahead of time seems tricky. But I think we can stream everything else right?
The trick with non-arrow, though, is that we have to use a format that is stream-able. Arrow is naturally a paged list of columns. So it works. Row-by-row output would work too. But our columnar response won't. Because it wants the whole column at a time.
But, yeah. We should stream it all.
On Mon, Aug 5, 2024, 5:28 PM David Turner @.***> wrote:
Now that we have support for streaming HTTP responses in general I'd suggest considering using it for all1 <#m_-1962731095711667399_user-content-fn-1-590b34940ee723839dadd145383b8c5d> response formats, not just arrow. It is so much simpler for clients than having to issue multiple requests to use pagination, and it should be easier to implement efficiently too. Footnotes
1.
Maybe not text/plain since computing column widths ahead of time seems tricky. But I think we can stream everything else right? ↩ <#m_-1962731095711667399_user-content-fnref-1-590b34940ee723839dadd145383b8c5d>
— Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/100000#issuecomment-2269949223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUXIQKPMQYOWZFVF7YGY3ZP7U6HAVCNFSM6AAAAABMA6ZQHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZHE2DSMRSGM . You are receiving this because you are on a team that was mentioned.Message ID: @.***>
Even when using the other columnar formats, from the client's point of view if it has to use pagination then it's going to receive the data as a sequence of columnar-format pages anyway, so we could achieve basically the same thing with less client-side complexity by streaming a sequence of those pages as a response to a single request.
Another thing about pagination - the clients can produce an error like:
Unable to retrieve search results
[esql] > Unexpected error from Elasticsearch: The content length (536885793) is bigger than the
maximum allowed string (536870888)
If we had pagination we should have some way to naturally cut responses smaller than 500mb. That'd defend against this client problem. And 500mb is an absolutely huge page.
Description
Currently ESQL returns all results in one page. This doesn't work in cases where a lot of data needs to be returned (such as streaming from the storage) or where the clients needs to consume the response in small pages. Similar to search_after/scroll API, ESQL endpoint should be able to 'stream' the results back to the client through a pagination mechanism.