clojurewerkz / elastisch

A minimalistic Clojure client for ElasticSearch, supports both HTTP and native transports
http://clojureelasticsearch.info
389 stars 134 forks source link

scroll-seq requires help when working with a "scan" search type. #72

Open tedgin opened 10 years ago

tedgin commented 10 years ago

I'm using elasticsearch 0.90.7, so this may be a temporary problem.

A scroll search with a scan search type does not return results with the initial call. A call to the _search/scroll endpoint needs to be made to retrieve the first set of results. This isn't consistent with the behavior of the other search types.

The otherwise very useful scroll-seq endpoint doesn't know about this behavioral inconsistency. When naively called like the following, it assumes the lack of matches returned in the initial search call means there are no matches. scroll-seq returns an empty list.

(scroll-seq (search "index" "mapping" :query (match-all) :search_type "scan"))

There is a work around, make a single scroll call and pass that result set to scroll-sea, but it would be neat if scroll-seq could hide the need to do that for scan search types.

lorthos commented 10 years ago

Thanks for the heads up, i was dealing with this earlier today..

emidln commented 10 years ago

Ran into this today. Do we actually need the seq here? https://github.com/clojurewerkz/elastisch/blob/master/src/clojurewerkz/elastisch/rest/document.clj#L221

michaelklishin commented 10 years ago

@emidln if there are no hits, should we continue scrolling?

emidln commented 10 years ago

Yes. This is due to ES's scroll/scan api being a special case. The first call returns no hits. We probably could do something along the lines of a multi-arity function where the lower arity calls the higher arity with true on the first call and then the lazy-seq calls the function with false and modify the seq check as appropriate.

michaelklishin commented 10 years ago

@emidln feel free to submit a pull request :)

emidln commented 9 years ago

I haven't had time to build up a full pull request, but for anyone who is affected, this gist provides a solution and a higher-level scan/scroll interface: https://gist.github.com/54e3e66715f38befa6da