Closed meninoebom closed 9 years ago
The elasticsearch-dsl-py package handles it itself actually. Just call .filter
on a search object (like any Bungiesearch instance), and you'll have a filtered query.
For example:
In [2]: RawArticle.objects.search.to_dict()
Out[2]: {'query': {'match_all': {}}}
In [3]: RawArticle.objects.search.filter('term', name='TEST').to_dict()
Out[3]:
{'query': {'filtered': {'filter': {'term': {'name': 'TEST'}},
'query': {'match_all': {}}}}}
Note: the to_dict()
function is part of elasticsearch_dsl-py's Search class.
Thanks again for the reply. Still missing something though. I'm trying to either chain the search objects filter method with the query method or somehow run a query with the generated filtered query like this
s = Song.objects.search s.filter('term', title='Test') results = s.query('match', _all='foo')
My goal is to search through the lyrics of songs with the query but limit the searchable dataset with filters such as term filters for the 'title' field and range filters for the 'release_date' field.
I cannot figure out how to A. execute filtered queries after they have been generated by the 'filter' method B. combine one or more filters with a search query to generate a result set in Django view to be then serialized and returned
Do you have examples of either of these two things, that you could paste in?
Thanks again for all your help.
To execute the query, the easiest way is to simply take a slice out of the bungie search object using brackets, eg [2:10] will tell elastic search to start from position 2 and retrieve a size of eight. The returned results will be automatically mapped to the Django objects, so it actually executes an SQL query to fetch the latest info from the database. If you want the raw Result objects from elastic search DSL, use a slice whose third item is True (it shouldn't be Python valid, but bungie search overwrites the slicing function so it works), eg [2:10:True] .
Sorry for the partial example only, I'm answering from my phone.
Best regards, Christopher Rabotin. On Jul 21, 2015 10:55 PM, "meninoebom" notifications@github.com wrote:
Thanks again for the reply. Still missing something though. I'm trying to either chain the search objects filter method with the query method or somehow run a query with the generated filtered query like this
s = Song.objects.search s.filter('term', title='Test') results = s.query('match', _all='foo')
My goal is to search through the lyrics of songs with the query but limit the searchable dataset with filters such as term filters for the 'title' field and range filters for the 'release_date' field.
I cannot figure out how to A. execute filtered queries after they have been generated by the 'filter' method B. combine one or more filters with a search query to generate a result set in Django view to be then serialized and returned
Do you have examples of either of these two things, that you could paste in?
Thanks again for all your help.
— Reply to this email directly or view it on GitHub https://github.com/Sparrho/bungiesearch/issues/92#issuecomment-123488690 .
Thank you again for responding. I am now able to create filtered queries.
The reason I struggled with this is because in the ES dsl a 'terms' must take an array: http://stackoverflow.com/questions/30363647/elasticsearch-the-terms-filter-raise-filter-does-not-support-mediatest
For instance this query body raised an exception:
[{"query": {"filtered": {"filter": {"terms": {"title":
When I changed my code to something like this, it worked:
title = params.get('title').split(',')
songs = Song.objects.search.filter("terms", title=title).query("match", title=q)
return songs
This should be added to the documentation of both BungieSearch and elasticsearch-dsl-py
I will try to send a pull request as soon as I can.
What are you trying achieve with [{"query": {"filtered": {"filter": {"terms": {"title": }}, "query": {"match": {"title": }}}}}]
? It initially doesn't look like a valid JSON object for searching in my eyes. It looks like two distinct queries to me, which would require the multi query API (or something like that) which isn't supported by elasticsearch-dsl-py.
Do you want to filter on the title and query the title? If so, this may lead to odd results if the title is a string, depending on the analyzer enabled on elasticsearch. In fact, if there is any analyzer, the filter
part won't filter anything, and query will do some kind of fuzzy matching (depending again on the analyzer used). If there isn't any analyzer, I expect this to return only if the title you search for matches strictly, in which case the "query": {"match": {"title": }}
doesn't add anything to your request (just more resource usage on the cluster). I may be wrong in my analysis here, so check out the difference between queries and filters.
With the solution you wrote, Song.objects.search.filter("terms", title=title).query("match", title=q)
, you'll filter the title field on title
and query that same field on q
, which leads to what I wrote right above. In itself it's a valid query, but I think you may want to use a term
filter instead of terms
filter: the former will match exactly the variable you pass, whereas the latter will match on the list of items you provide. For example, in Song.objects.search.filter("terms", title=title1)
, if title1="some title"
, the executed query with a terms
filter is ['s', 'o', 'm', 'e', ' ', 't', 'i', 't', 'l', 'e']
; if using a term
filter, it'll filter exactly on "some title"
.
My goal is to build out an endpoint for searching for terms in the lyrics field of a model called Song.
Filtering and querying the same field was a mistake. I meant to query the lyrics field and filter by the title field. I also see that the terms filter requires an array but the term filter does not. So now this endpoint is working:
127.0.0.1:8000/api/lyrics?q=girlfriend&title=love
Forgive me for throwing this all in a single comment, but I need help and don't want to pollute your repo with issues.
My query is simply this:
songs = s.query("match", lyrics=str(q))
Single word searches work.
But phrase searches like this...
127.0.0.1:8000/api/lyrics?q=I%20love%20you
return too many results.
I think this means lyrics field was indexed as a full-text analyzed field so that when I do a search for 'I love you' the search term is passed through and analyzer to produce a list of terms ['I', 'love', 'you']
Is there another way to write this query so that I can search for phrases? Or should I be researching how to map the lyrics field in the index as a string?
A term filter for a single word works, but a request that passes a phrase into the term filter returns an empty result set.
For example:
.../api/lyrics?q=girlfriend&title=love%20me
generates the term filter
{'filter': {'term': {'title': 'love me'}}
but does not returns an empty result ([]). No even
"title": "You Must Love Me"
I assume the problem again is that the title field was mapped as an analyzed field. Is that correct? Or, is it that my filter was constructed incorrectly?
I don't see any examples of creating filters on related fields in the BungieSearch or the elasticsearch-dsl-py docs. This I assume is just a matter of syntax, but perhaps it is also a matter of mapping.
How can I add a field to the JSON that gets serialized that will contain the count from the query results?
As always, thank you so much for taking the time to read and reply to these comments. Even just pointing me in the right direction in terms of docs to read would be a huge help. There are not a lot of resources for utilizing the wrappers written for ES.
Indeed, the situation you're facing is actually one we've had at Sparrho. You need to reindex at least one field which can be filtered, and at worse, the whole database of articles. Depending on your dataset and ES cluster, it may take a few days.
Effectively, a filter will exactly filter (think of coffee filter) the words you type, without any kind of wildcard. It's great for filtering by ID for example. If you're filtering by a string, you have to make sure it exactly is what you're looking for. So in the case of "I love you", it'll only filter on titles that exactly match that, no trailing character and no leading character. If you need a leading or trailing character, you should check the prefix or regexp filters. They should do the trick.
From the whole issue you wrote, I reckon the best strategy is to add another field to the ModelIndex class. In fact, add a StringField which has, as parameters, 'analyzer': None (I think, double check with the Readme file), and then reindex your dataset. Then you'll be able to filter exactly the keywords you provide.
I hope this makes sense. I'll double check in with the morning (UK time, where I'm currently based).
Best regards, Christopher Rabotin. On Jul 24, 2015 9:13 PM, "meninoebom" notifications@github.com wrote:
My goal is to build out an endpoint for searching for terms in the lyrics field of a model called Song.
Filtering and querying the same field was a mistake. I meant to query the lyrics field and filter by the title field. I also see that the terms filter requires an array but the term filter does not. So now this endpoint is working:
127.0.0.1:8000/api/lyrics?q=girlfriend&title=love Related but separate questions
Forgive me for throwing this all in a single comment, but I need help and don't want to pollute your repo with issues. How to query for phrases
My query is simply this: songs = s.query("match", lyrics=str(q))
Single word searches work. But phrase searches like this... 127.0.0.1:8000/api/lyrics?q=I%20love%20you
return too many results.
I think this means lyrics field was indexed as a full-text analyzed field so that when I do a search for 'I love you' the search term is passed through and analyzer to produce a list of terms ['I', 'love', 'you']
Is there another way to write this query so that I can search for phrases? Or should I be researching how to map the lyrics field in the index as a string? How to filter for phrases
A term filter for a single word works, but a request that passes a phrase into the term filter returns an empty result set.
For example: .../api/lyrics?q=girlfriend&title=love%20me generates the term filter {'filter': {'term': {'title': 'love me'}} but does not returns an empty result ([]). No even "title": "You Must Love Me"
I assume the problem again is that the title field was mapped as an analyzed field. Is that correct? Or, is it that my filter was constructed incorrectly? How to filter on related fields
I don't see any examples of creating filters on related fields in the BungieSearch or the elasticsearch-dsl-py docs. This I assume is just a matter of syntax, but perhaps it is also a matter of mapping. Include the count in the returned JSON
How can I add a field to the JSON that gets serialized that will contain the count from the query results?
As always, thank you so much for taking the time to read and reply to these comments. Even just pointing me in the right direction in terms of docs to read would be a huge help. There are not a lot of resources for utilizing the wrappers written for ES.
— Reply to this email directly or view it on GitHub https://github.com/Sparrho/bungiesearch/issues/92#issuecomment-124704172 .
To query for phrases, the best is to use the match_phrase
query. If I'm not mistaken, elastic search will search for all of the words as seen together. As you correctly pointed out, a match
query will search for all the terms independently of each other. In the case of "I love you", the returned items will have the words independently, but possibly not next to each other, e.g. "I code. Github loves git. You were right".
Filters only work on fields which are not analyzed: is your title
field analyzed? By default, StringFields in ModelIndex subclasses (which bungiesearch uses to create the mapping on ES) have the snowball analyzer enabled. Hence, you can't apply string filters on them.
If you want to filter on string titles, you can either drop the index and re-index your content after specifying in the ModelIndex that you want the title field to be non analyzed, or you can add a field in your ModelIndex which will be mapped to the title but not be analyzed, e.g. title_filter = StringField(model_attr='title', analyzer=None)
. Then you have to update the mapping with the search_index command and --update-mapping
flag, and then reindex your dataset. After these steps, you'll be able to apply any kind of filter on that field, e.g. .filter('term', title_filter='I love you')
.
Filters do exact matching. For example, term-filtering on "lover" will not return any document which has the word "love" or "lovers".
Do you mean related field on Django, like foreign keys? The way to do this is by adding a field to your ModelIndex with an eval_as
parameter that you then code up to retrieve and format what you'd like indexed from the relation, cf. the meta_data
field in https://github.com/Sparrho/bungiesearch#modelindex.
Note: I'm not 100% sure I understood the question, so let me know if this actually does answer your question.
If you want the number of results found on ES for your query/filter, you can do a len(search_object)
where search_object
is a Bungiesearch object with your query, filter, and all that, cf. this example. Note that this will send the query to ES on the _count
API endpoint, which requires ES to execute your search (or at least partially since that endpoint doesn't take into consideration stuff like min_score
, nor does it fetch any documents).
Anyhow, if you want to avoid ES from having to do a count after you just executed that same query (which can be useful if you run very resource intense queries for example), you'll want to get the raw results from ES, which contains the hits, and then map the results. For example:
srch = Article.objects.search.query('match', _all='Description')
srch_rslts = srch[0:20:True] # The True will skip automatic mapping.
print 'Found {} results.'.format(srch_rslts.hits.total)
mapped = Bungiesearch.map_raw_results(srch_rslts) # Contains a list of Article objects.
I hope this helps more than my succinct answer of yesterday.
Handled in #91 by @meninoebom . Thanks =)
I am trying to filter a query for date range and related field values. I assume that the syntax for BungieSearch filtered queries syntax would be derived from elasticsearch-dsl-py. However from reading the elasticsearch-dsl-py docs I am still not able to figure out how to construct filtered queries. Any advice or nudge in toward the right documentation is welcome. Thanks in advance.