dainiusjocas / lazy-elasticsearch-scroll

Exposes data in Elasticsearch as a Clojure lazy sequence
Apache License 2.0
1 stars 0 forks source link

Deal with Aggregations #5

Closed dainiusjocas closed 4 years ago

dainiusjocas commented 4 years ago

Dissoc if allowed at all for performance.

dainiusjocas commented 4 years ago

For example:

(take 2 (scroll/hits
   {:es-host    "http://localhost:9200"
    :query      {:size 1
                 :query {:match_all {}}
                 :aggs {:NAME {:terms {:field "value"
                                        :size 10}}}}}))
;; =>
({:_id "space:default",
  :_type "_doc",
  :_score 1.0,
  :_index ".kibana_1",
  :_source {:space {:description "This is your default space!",
                    :color "#00bfb3",
                    :name "Default",
                    :_reserved true,
                    :disabledFeatures []},
            :migrationVersion {:space "6.6.0"},
            :type "space",
            :references [],
            :updated_at "2020-02-12T14:16:18.621Z"}}
 {:_id "config:7.6.0",
  :_type "_doc",
  :_score 1.0,
  :_index ".kibana_1",
  :_source {:config {:buildNum 29000}, :type "config", :references [], :updated_at "2020-02-12T14:16:20.526Z"}})

Intermediary responses looks like:

;; 1st
{:took 5,
 :hits
 {:hits
  [{:_id "space:default",
    :_type "_doc",
    :_score 1.0,
    :_index ".kibana_1",
    :_source
    {:space
     {:description "This is your default space!",
      :color "#00bfb3",
      :name "Default",
      :_reserved true,
      :disabledFeatures []},
     :migrationVersion {:space "6.6.0"},
     :type "space",
     :references [],
     :updated_at "2020-02-12T14:16:18.621Z"}}],
  :total {:value 43, :relation "eq"},
  :max_score 1.0},
 :_shards {:successful 4, :skipped 0, :total 4, :failed 0},
 :timed_out false,
 :_scroll_id
 "DnF1ZXJ5VGhlbkZldGNoBAAAAAAAAGJAFjd0VzBGU2ZRUzFxU05zaVZzRExqY3cAAAAAAABiQRY3dFcwRlNmUVMxcVNOc2lWc0RMamN3AAAAAAAAYkIWN3RXMEZTZlFTMXFTTnNpVnNETGpjdwAAAAAAAGJDFjd0VzBGU2ZRUzFxU05zaVZzRExqY3c=",
 :aggregations
 {:NAME
  {:doc_count_error_upper_bound 1,
   :buckets
   [{:key 0, :doc_count 1}
    {:key 1, :doc_count 1}
    {:key 2, :doc_count 1}
    {:key 3, :doc_count 1}
    {:key 4, :doc_count 1}
    {:key 5, :doc_count 1}
    {:key 6, :doc_count 1}
    {:key 7, :doc_count 1}
    {:key 8, :doc_count 1}
    {:key 9, :doc_count 1}],
   :sum_other_doc_count 23}}}
; 2nd
{:took 1,
 :hits
 {:hits
  [{:_id "config:7.6.0",
    :_type "_doc",
    :_score 1.0,
    :_index ".kibana_1",
    :_source
    {:config {:buildNum 29000},
     :type "config",
     :references [],
     :updated_at "2020-02-12T14:16:20.526Z"}}],
  :total {:value 43, :relation "eq"},
  :max_score 1.0},
 :_shards {:successful 4, :skipped 0, :total 4, :failed 0},
 :timed_out false,
 :terminated_early true,
 :_scroll_id
 "DnF1ZXJ5VGhlbkZldGNoBAAAAAAAAGJAFjd0VzBGU2ZRUzFxU05zaVZzRExqY3cAAAAAAABiQRY3dFcwRlNmUVMxcVNOc2lWc0RMamN3AAAAAAAAYkIWN3RXMEZTZlFTMXFTTnNpVnNETGpjdwAAAAAAAGJDFjd0VzBGU2ZRUzFxU05zaVZzRExqY3c="}

This means that aggregations are being performed, and if the aggregations are expensive to compute then the performace suffers. Since, we are not returning what is under aggregations key, we can safelly dissoc aggs keyword from query.