elastic / elasticsearch-py

Official Python client for Elasticsearch
https://ela.st/es-python
Apache License 2.0
4.21k stars 1.18k forks source link

helpers.scan won't accept string as query/body #1296

Closed EmilBode closed 4 years ago

EmilBode commented 4 years ago

Elasticsearch version : 7.4.2 elasticsearch-py version: 7.0.2

Description of the problem including expected versus actual behavior:

When I want to use the helpers.scan-function with directly specifying the body/query as a string, it fails.

Steps to reproduce: next( elasticsearch.helpers.scan(my_es_instance, index='my-index', query='{"query": {"match_all": {}}}')) (Note that the query is surrounded by single quotes, making it a str)

Expected behaviour: Results, just as when I'd have specified the query as a dict. Actual behaviour: it fails with

File "C:\....\elasticsearch\helpers\action.py" line 430, in scan
  query = query.copy() if query else {}

AttributeErrorL 'str' has no attribute 'copy'

Note that if instead I specify a body parameter, it fails in a different way:

next(elasticsearch.helpers.scan(my_es_instance, index='my-index', body='{"query": {"match_all": {}}}'))

File "C:\....\elasticsearch\helpers\action.py" line 435, in scan
  body=query, scroll=scroll, size=size, request_timeout=request_timeout, **kwargs

TypeError: search() got multiple values for keyword argument 'body'

Also note that elasticsearch.Elasticsearch().search() accepts both strings and dicts as body-parameter

This problem can be circumvented by specifying preserve_order=True, but this is not a good idea performance-wise. Alternatively, it can be solved by next(elasticsearch.helpers.scan(my_es_instance, index='my-index', query=eval('{"query": {"match_all": {}}}')))

Proposed solution I think this problem could be solved if the string-evalution is done within scan. Line 430 in actions.py could be changed into: query = query.copy() if isinstance(query, dict) else eval(query) if query else {}

sethmlarson commented 4 years ago

This sounds like a fine improvement to make, would you be willing to create a PR with your suggested change along with a test case?

Also could you let me know why you'd want to use a string over a dictionary? Thanks!

EmilBode commented 4 years ago

I've been busy, so it took me a while to answer.

I'll make a PR and test, coming soon.

In my use case, I simply get query-objects from some older code. They used to pass that directly to Elasticsearch(...).search(), manually keeping track of scroll_ids. With the disadvantage that the scroll wasn't closed on exhaustion but only after timeout. For performance, we decided to switch to helper.scan(), except than we encountered this issue.

sethmlarson commented 4 years ago

Closing this, handing the helper function a dictionary over a string is up to users.