ViaQ / integration-tests

1 stars 5 forks source link

Do not use HTTP _count API in openshift-test.sh #8

Closed lukas-vlcek closed 7 years ago

lukas-vlcek commented 7 years ago

TL;DR

Change the third param of curl_es() function from search type (i.e. _search vs _count) to size value and update get_count_from_json() function accordingly:

# $1 - es hostname
# $2 - project name (e.g. logging, test, .operations, etc.)
# $3 - size
# $4 - field to search
# $5 - search string
# $6 - extra params e.g. '&fields=message&size=1000'
# stdout is the JSON output from Elasticsearch
# stderr is curl errors
curl_es() {
    curl --connect-timeout 1 -s \
       http://${1}:9200/${2}*/_search\?size=${3}&q=${4}:"${5}""${6:-}"
}

# We shall also change this to read from hits.total_hits
get_count_from_json() {
    python -c 'import json, sys; print json.loads(sys.stdin.read())["hits.total_hits"]'
}

Then update the script where these functions are called.

Details

The function curl_es() allows to pass in _search or _count REST endpoint (or other value which will lead to an error). Elasticsearch 5.x removed the count API, see https://github.com/elastic/elasticsearch/pull/14166

In fact they removed _count from all APIs except the HTTP REST API for compatibility reasons. Now the _count requests are translated to _search?size=0 requests (https://github.com/elastic/elasticsearch/issues/13928#issuecomment-148672186, see code change here https://github.com/elastic/elasticsearch/commit/a6e7a5f30793dd36046cf752a0232947788f052d#diff-527e9a00a054e58e4ccf96dbea78ff9bR67). For the HTTP layer they also translate the JSON response to old (_count) format, so it still contains count top level field (value is taken from hits.total_hits, see https://github.com/elastic/elasticsearch/commit/a6e7a5f30793dd36046cf752a0232947788f052d#diff-527e9a00a054e58e4ccf96dbea78ff9bR101).

Although clients can still use _count API via HTTP it would be better to always use _search and allow to pass in the size value. This will make it very clear what the expected result should be. We can also expect consistent response JSON format and read the total document count from hits.total_hits.

lukas-vlcek commented 7 years ago

As per comment here the count API has been removed in Java API but is kept in HTTP API. This is probably the commit with changes. Closing now.