[BUG] Scroll API fails when query hits too many shards

elastic / elasticsearch-rs

Official Elasticsearch Rust Client

Apache License 2.0

695 stars 71 forks source link

Describe the bug

The size of the scroll_id token is proportional to the number of shards hit by the query.

We found info on scroll_id's generating large strings https://github.com/elastic/elasticsearch-py/issues/971 . And looking over the Scroll code in rust, it always puts the scroll_id in to a GET request and in to the query params. So large queries will cause the Elasticsearch cluster's to report An HTTP line is larger than 4096 bytes..

To Reproduce Steps to reproduce the behavior:

Hit a lot of shards in a query
Watch ES reject the second page of your scroll logic

Expected behavior

The Scroll code in rust should always use a POST so it can have long scroll_ids.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Windows 10 Pro]
rustc version [e.g. rustc --version]

Additional context Add any other context about the problem here.

Hi @xrl

And looking over the Scroll code in rust, it always puts the scroll_id in to a GET request and in to the query params. So large queries will cause the Elasticsearch cluster's to report An HTTP line is larger than 4096 bytes.

Scroll will use a GET request if there is no request body, a POST otherwise

https://github.com/elastic/elasticsearch-rs/blob/1e619d82b1d70c068ff9cabdd86bfbda04e1e500/elasticsearch/src/generated/root.rs#L5893-L5899

This is generated from the REST API spec scroll.json which has deprecated scroll_id as part of the query string parameters in 7.0.0, with a view to remove in the future major version. The API itself will continue to support both GET and POST methods, as different HTTP libraries across the language eco systems (rightly or wrongly 🙂 ) support sending request bodies with a GET request.

In light of the change happening in the spec, I'm going to leave it as is in the client, as the change will make its way into the client. An example of sending the scroll id in the body is in https://github.com/elastic/elasticsearch-rs/blob/master/elasticsearch/docs/Elasticsearch.fn.scroll.md, which will make its way into the docs on docs.rs on the next release.

elastic / elasticsearch-rs

[BUG] Scroll API fails when query hits too many shards #86