clojurewerkz / elastisch

A minimalistic Clojure client for ElasticSearch, supports both HTTP and native transports
http://clojureelasticsearch.info
389 stars 135 forks source link

rest.bulk.bulk-index requires unnesting of :_source field. #138

Closed gtrak closed 8 years ago

gtrak commented 9 years ago

I had to write this function to work around the output of bulk-index containing :_source fields.

(defn unsourceify
  [bulk-ops]
  (for [op bulk-ops]
    (if-let [source (:_source op)]
      source
      op)))

The seq transforms I'm using on source data to generate chunks of bulk index commands are like so:

(->> (scroll endpoint index)
     (partition-all 500)
     (map bulk/bulk-index)
     (map unsourceify))

Where scroll returns the output of scroll-seq.

michaelklishin commented 9 years ago

@gtrak there are some functions in document (IIRC) that filter out response keys. Please submit a PR that does the same in bulk. Cheers.

jqmtor commented 8 years ago

Hey @michaelklishin and @gtrak ,

I've been looking into this and I think there is a slight misunderstanding. By looking at the code posted by @gtrak, it seems to me that he simply passed the input operations in an unexpected format to bulk-index. It is true that scroll-seq does not unwrap the _source field and simply creates a lazy sequence based on the hits field in the response, but this is consistent with the remaining responses in Elastisch. This means that if the result of scroll-seq is passed to bulk-index directly (without previously unwrapping _source), the operations will indeed contain the _source field and will not contain the document fields in the top-level.

@michaelklishin, I am not sure I understood your suggestion correctly. Can you clarify what you think is missing? I'd gladly submit a PR.

michaelklishin commented 8 years ago

@quimrstorres there are functions in Elastisch that already have to do filtering out response keys such as _source. There's some reuse potential.