Change the third param of curl_es() function from search type (i.e. _search vs _count) to size value and update get_count_from_json() function accordingly:
# $1 - es hostname
# $2 - project name (e.g. logging, test, .operations, etc.)
# $3 - size
# $4 - field to search
# $5 - search string
# $6 - extra params e.g. '&fields=message&size=1000'
# stdout is the JSON output from Elasticsearch
# stderr is curl errors
curl_es() {
curl --connect-timeout 1 -s \
http://${1}:9200/${2}*/_search\?size=${3}&q=${4}:"${5}""${6:-}"
}
# We shall also change this to read from hits.total_hits
get_count_from_json() {
python -c 'import json, sys; print json.loads(sys.stdin.read())["hits.total_hits"]'
}
Then update the script where these functions are called.
Details
The function curl_es() allows to pass in _search or _count REST endpoint (or other value which will lead to an error). Elasticsearch 5.x removed the count API, see https://github.com/elastic/elasticsearch/pull/14166
Although clients can still use _count API via HTTP it would be better to always use _search and allow to pass in the size value. This will make it very clear what the expected result should be. We can also expect consistent response JSON format and read the total document count from hits.total_hits.
TL;DR
Change the third param of
curl_es()
function from search type (i.e._search
vs_count
) tosize
value and updateget_count_from_json()
function accordingly:Then update the script where these functions are called.
Details
The function
curl_es()
allows to pass in_search
or_count
REST endpoint (or other value which will lead to an error). Elasticsearch 5.x removed the count API, see https://github.com/elastic/elasticsearch/pull/14166In fact they removed
_count
from all APIs except the HTTP REST API for compatibility reasons. Now the_count
requests are translated to_search?size=0
requests (https://github.com/elastic/elasticsearch/issues/13928#issuecomment-148672186, see code change here https://github.com/elastic/elasticsearch/commit/a6e7a5f30793dd36046cf752a0232947788f052d#diff-527e9a00a054e58e4ccf96dbea78ff9bR67). For the HTTP layer they also translate the JSON response to old (_count
) format, so it still containscount
top level field (value is taken fromhits.total_hits
, see https://github.com/elastic/elasticsearch/commit/a6e7a5f30793dd36046cf752a0232947788f052d#diff-527e9a00a054e58e4ccf96dbea78ff9bR101).Although clients can still use
_count
API via HTTP it would be better to always use_search
and allow to pass in thesize
value. This will make it very clear what the expected result should be. We can also expect consistent response JSON format and read the total document count fromhits.total_hits
.