elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.57k stars 24.62k forks source link

ESQL: Support pagination #100000

Open costin opened 11 months ago

costin commented 11 months ago

Description

Currently ESQL returns all results in one page. This doesn't work in cases where a lot of data needs to be returned (such as streaming from the storage) or where the clients needs to consume the response in small pages. Similar to search_after/scroll API, ESQL endpoint should be able to 'stream' the results back to the client through a pagination mechanism.

elasticsearchmachine commented 11 months ago

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine commented 11 months ago

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

felixbarny commented 11 months ago

Congrats on creating issue #100000 😄

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 commented 1 month ago

A note for later - now that we have support for arrow it's streaming format could "naturally" support, well, streaming results instead of pagination. Not the same. And we'd have to flip a few things around to make streaming result sets work, but that's something we could do too.

DaveCTurner commented 1 month ago

Now that we have support for streaming HTTP responses in general I'd suggest considering using it for all[^1] response formats, not just arrow. It is so much simpler for clients than having to issue multiple requests to use pagination, and it should be easier to implement efficiently too.

[^1]: Maybe not text/plain since computing column widths ahead of time seems tricky. But I think we can stream everything else right?

nik9000 commented 1 month ago

The trick with non-arrow, though, is that we have to use a format that is stream-able. Arrow is naturally a paged list of columns. So it works. Row-by-row output would work too. But our columnar response won't. Because it wants the whole column at a time.

But, yeah. We should stream it all.

On Mon, Aug 5, 2024, 5:28 PM David Turner @.***> wrote:

Now that we have support for streaming HTTP responses in general I'd suggest considering using it for all1 <#m_-1962731095711667399_user-content-fn-1-590b34940ee723839dadd145383b8c5d> response formats, not just arrow. It is so much simpler for clients than having to issue multiple requests to use pagination, and it should be easier to implement efficiently too. Footnotes

1.

Maybe not text/plain since computing column widths ahead of time seems tricky. But I think we can stream everything else right? ↩ <#m_-1962731095711667399_user-content-fnref-1-590b34940ee723839dadd145383b8c5d>

— Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/100000#issuecomment-2269949223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUXIQKPMQYOWZFVF7YGY3ZP7U6HAVCNFSM6AAAAABMA6ZQHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZHE2DSMRSGM . You are receiving this because you are on a team that was mentioned.Message ID: @.***>

DaveCTurner commented 1 month ago

Even when using the other columnar formats, from the client's point of view if it has to use pagination then it's going to receive the data as a sequence of columnar-format pages anyway, so we could achieve basically the same thing with less client-side complexity by streaming a sequence of those pages as a response to a single request.

nik9000 commented 1 week ago

Another thing about pagination - the clients can produce an error like:

Unable to retrieve search results
[esql] > Unexpected error from Elasticsearch: The content length (536885793) is bigger than the 
maximum allowed string (536870888)

If we had pagination we should have some way to naturally cut responses smaller than 500mb. That'd defend against this client problem. And 500mb is an absolutely huge page.