elastic / elasticsearch-py

Official Python client for Elasticsearch
https://ela.st/es-python
Apache License 2.0
4.17k stars 1.17k forks source link

sort param in elastic.search is ignoring sorting #2592

Closed ArtemIsmagilov closed 2 weeks ago

ArtemIsmagilov commented 2 weeks ago

Describe the feature:

Elasticsearch version (bin/elasticsearch --version): 8.14.1

elasticsearch-py version (elasticsearch.__versionstr__): 8.14.0

order, row = ('desc', sort[1:]) if sort[0] == '-' else ('asc', sort)
sort = [{row: {'order': order}}]
query = {'match_all': {}}

docs = await self.elastic.search(
    index='movies',
    query=query,
    sort=sort,
    from_=page_number,
    size=page_size,
)

However, by sending a request through the console, I receive the correct data with sorting. In python client sort is simply ignored.

{
  "query": {"match_all": {}},
  "from": 0,
  "size": 1,
  "sort": [{"id": {"order": "asc"}}]
}
ArtemIsmagilov commented 2 weeks ago

According to the logs, I see that POST requests are being sent , but the working option will be a GET request. This is indicated in the documentation

ArtemIsmagilov commented 2 weeks ago

you seem to have a method error in your code. see the documentation. the GET method is used for the search API and you have a POST

ArtemIsmagilov commented 2 weeks ago

I tried to send a POST request from the console and this working. There doesn't seem to be a difference between POST or GET. It works both ways for the database. But not for the python library client. However, you need to change the Http method to match the official documentation.

ArtemIsmagilov commented 2 weeks ago

I'm sorry. The documentation seems to allow both POST and GET. But this does not negate the problem for the python client.

GET //_search

GET /_search

POST //_search

POST /_search

miguelgrinberg commented 2 weeks ago

Works fine for me.

Here is descending sort:

>>> from pprint import pprint
>>> from elasticsearch import Elasticsearch
>>> client = Elasticsearch(hosts=['http://localhost:9200'])
>>> r = client.search(index="workplace_documents", sort=[{"created":{"order":"desc"}}], source=['name', 'created'], size=5)
>>> pprint(r['hits'])
{'hits': [{'_id': '-Q0ONpABqFWd0zBiKwOE',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-05-01T00:00:00',
                       'name': 'Wfh Policy Update May 2023'},
           'sort': [1682899200000]},
          {'_id': '-g0ONpABqFWd0zBiKwOJ',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-04-15T00:00:00',
                       'name': 'Fy2024 Company Sales Strategy'},
           'sort': [1681516800000]},
          {'_id': 'BA0ONpABqFWd0zBiKwS1',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-12-20T00:00:00',
                       'name': 'Updating Your Tax Elections Forms'},
           'sort': [1671494400000]},
          {'_id': '-A0ONpABqFWd0zBiKwOC',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-04-29T00:00:00',
                       'name': 'April Work From Home Update'},
           'sort': [1651190400000]},
          {'_id': '_g0ONpABqFWd0zBiKwOc',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2021-06-15T00:00:00',
                       'name': 'Intellectual Property Policy'},
           'sort': [1623715200000]}],
 'max_score': None,
 'total': {'relation': 'eq', 'value': 15}}

And here is ascending sort on the same index and field:

>>> r = client.search(index="workplace_documents", sort=[{"created":{"order":"asc"}}], source=['name', 'created'], size=5)
>>> pprint(r['hits'])
{'hits': [{'_id': '_w0ONpABqFWd0zBiKwOg',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2018-01-12T00:00:00',
                       'name': 'Code Of Conduct'},
           'sort': [1515715200000]},
          {'_id': 'AA0ONpABqFWd0zBiKwSl',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2018-01-12T00:00:00',
                       'name': 'Office Pet Policy'},
           'sort': [1515715200000]},
          {'_id': 'AQ0ONpABqFWd0zBiKwSp',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2018-01-12T00:00:00',
                       'name': 'Performance Management Policy'},
           'sort': [1515715200000]},
          {'_id': 'Aw0ONpABqFWd0zBiKwSx',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2018-01-12T00:00:00',
                       'name': 'Compensation Framework For It Teams'},
           'sort': [1515715200000]},
          {'_id': 'BQ0ONpABqFWd0zBiKwS6',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2018-01-12T00:00:00',
                       'name': 'New Employee Onboarding Guide'},
           'sort': [1515715200000]}],
 'max_score': None,
 'total': {'relation': 'eq', 'value': 15}}
ArtemIsmagilov commented 2 weeks ago

the body option doesn't work correctly either

order, row = ('desc', sort[1:]) if sort[0] == '-' else ('asc', sort)
body = {
    'query': {'match_all': {}},
    'sort': [{row: {'order': order}}],
    'from': page_number,
    'size': page_size,
}
docs = await self.elastic.search(index='movies', body=body)
ArtemIsmagilov commented 2 weeks ago

@miguelgrinberg Hi, thanks for answering. The first thing I see is that you are using the synchronous client. I have an asynchronous client. Thanks for narrowing down the cause of the problem)

miguelgrinberg commented 2 weeks ago

Both clients are generated from the same code, it does not matter sync vs. async.

>>> from elasticsearch import AsyncElasticsearch
>>> client = AsyncElasticsearch(hosts=['http://localhost:9200'])
>>> r = await client.search(index="workplace_documents", sort=[{"created":{"order":"desc"}}], source=['name', 'created'], size=5)
>>> pprint(r['hits'])
{'hits': [{'_id': '-Q0ONpABqFWd0zBiKwOE',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-05-01T00:00:00',
                       'name': 'Wfh Policy Update May 2023'},
           'sort': [1682899200000]},
          {'_id': '-g0ONpABqFWd0zBiKwOJ',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-04-15T00:00:00',
                       'name': 'Fy2024 Company Sales Strategy'},
           'sort': [1681516800000]},
          {'_id': 'BA0ONpABqFWd0zBiKwS1',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-12-20T00:00:00',
                       'name': 'Updating Your Tax Elections Forms'},
           'sort': [1671494400000]},
          {'_id': '-A0ONpABqFWd0zBiKwOC',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-04-29T00:00:00',
                       'name': 'April Work From Home Update'},
           'sort': [1651190400000]},
          {'_id': '_g0ONpABqFWd0zBiKwOc',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2021-06-15T00:00:00',
                       'name': 'Intellectual Property Policy'},
           'sort': [1623715200000]}],
 'max_score': None,
 'total': {'relation': 'eq', 'value': 15}}

Using body and adding query also works the same for me:

>>> r = await client.search(index="workplace_documents", body={"query":{"match_all":{}}, "sort": [{"created":{"order":"desc"}}]}, source=['name', 'created'], size=5)
ArtemIsmagilov commented 2 weeks ago

Thanks for trying the asynchronous sorting client. Could you try to do a search with pagination and sorting?

miguelgrinberg commented 2 weeks ago

That works too. Here is page 1 and 2 with page size = 2.

>>> r = await client.search(index="workplace_documents", body={"query": {"match_all": {}}, "sort": [{"created":{"order":"desc"}}]}, source=['name', 'created'], from_=0, size=2)
>>> pprint(r['hits'])
{'hits': [{'_id': '-Q0ONpABqFWd0zBiKwOE',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-05-01T00:00:00',
                       'name': 'Wfh Policy Update May 2023'},
           'sort': [1682899200000]},
          {'_id': '-g0ONpABqFWd0zBiKwOJ',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2023-04-15T00:00:00',
                       'name': 'Fy2024 Company Sales Strategy'},
           'sort': [1681516800000]}],
 'max_score': None,
 'total': {'relation': 'eq', 'value': 15}}
>>> r = await client.search(index="workplace_documents", body={"query": {"match_all": {}}, "sort": [{"created":{"order":"desc"}}]}, source=['name', 'created'], from_=2, size=2)
>>> pprint(r['hits'])
{'hits': [{'_id': 'BA0ONpABqFWd0zBiKwS1',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-12-20T00:00:00',
                       'name': 'Updating Your Tax Elections Forms'},
           'sort': [1671494400000]},
          {'_id': '-A0ONpABqFWd0zBiKwOC',
           '_index': 'workplace_documents',
           '_score': None,
           '_source': {'created': '2022-04-29T00:00:00',
                       'name': 'April Work From Home Update'},
           'sort': [1651190400000]}],
 'max_score': None,
 'total': {'relation': 'eq', 'value': 15}}
ArtemIsmagilov commented 2 weeks ago

what is your version of elastic-search? I have docker image 8.14.1 running. https://hub.docker.com/_/elasticsearch

ArtemIsmagilov commented 2 weeks ago

To be honest, I don’t think it’s a matter of version. I will try to study this problem and post the results

ArtemIsmagilov commented 2 weeks ago

@miguelgrinberg, I want to apologize to you. You've been right all along. Your options of mine worked perfectly. I was just repeating after you. Then I looked at my code and then back at mine and just didn't see the sort( I was looking for sort in '_source' and it's not there at all. (

ArtemIsmagilov commented 2 weeks ago

Thank you very much for your help, I hope I didn't take up too much time.