django-haystack / django-haystack

Modular search for Django
http://haystacksearch.org/
Other
3.59k stars 1.3k forks source link

How to use JSON queries with Haystack using Elasticsearch? #927

Open rogaha opened 10 years ago

rogaha commented 10 years ago

I would like to use following JSON query with Haystack, but cannot find how to use JSON queries instead of QuerySet (as it seems there is no way to do this in QuerySet).

"query": {
    "filtered": {
        "filter": {
            "bool": {
                "must": [
                    {
                        "term": {
                            "django_ct": "repositories.repository"
                        }
                    },
                    {
                        "term": {
                            "status": 1
                        }
                    },
                    {
                        "term": {
                            "is_private": false
                        }
                    },
                    {
                        "term": {
                            "is_library": false
                        }
                    }
                ]
            }
        },
        "query": {
            "function_score": {
                "query": {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "name": {
                                        "query": "hipache"
                                    }
                                }
                            },
                            {
                                "custom_boost_factor": {
                                    "boost_factor": 0.005,
                                    "query": {
                                        "multi_match": {
                                            "fields": [
                                                "name_auto^3",
                                                "name_auto.partial^2",
                                                "name_auto.partial_back",
                                                "name_auto.partial_middle"
                                            ],
                                            "query": "hipache"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                },
                "script_score": {
                    "script": "_score * (1 + 10.0*log(1 + doc['pull_count'].value))"
                }
            }
        }
    }
}
honzakral commented 10 years ago

Currently there is no clean and direct way to do it, I'd suggest either extending the backend to allow for such queries or dropping to the low-level client (by accessing conn attribute on the backend) and then calling backend._process_results to hook back into haystack's result parsing. See the search method on the backend to see how to exactly hook into it.

I will leave this open to serve as a place holder for the plug-in functionality I'd like to have - ability to provide a raw query and have haystack's query set functionality wrap it.

rogaha commented 10 years ago

Ok, cool! Thanks for all the description. Let's see if I can do what you suggested.

rogaha commented 10 years ago

Hi @HonzaKral, now it's working! I created a custom search and build_search__kwargsmethods to perform the custom queries! :)

Thanks you very much for your help!

honi commented 10 years ago

@rogaha could you please share your modifications? I'm after the same thing.

Did you manage to use the JSON query from a SearchQuerySet?

What I've managed to do so far is something like this:

# query is the JSON query
backend = connections.all()[0].get_backend()
raw_results = backend.conn.search(query, index=backend.index_name, doc_type='modelresult')
results = backend._process_results(raw_results)

This actually works, but I feel like a better approach would be something like:

sqs = SearchQuerySet().raw_query(query)

And this way I suppose it would be possible to use other methods provided by the SearchQuerySet, like using(), load_all(), etc.

rogaha commented 10 years ago

Hi @honi,

I extended the class ElasticsearchSearchBackend with a custom_search() that takes a query term and build the JSON query in the elasticsearch format method and then I overwrote the method that I was using from SearchQuerySet() which was autocomplete(). So now my autocomplete looks like this:

def autocomplete(self, is_private=None, is_library=None, status=None, **kwargs):
        """
        A shortcut method to perform an autocomplete search.
        Must be run against fields that are either ``NgramField`` or
        ``EdgeNgramField``.
        """
        clone = self._clone()
        query_bits = []
        for field_name, query in kwargs.items():
            if not query:
                continue
            for word in query.split(' '):
                bit = clone.query.clean(word.strip())
                # fixes the issue with '/' from elasticsearch parser (issue: #2980)
                bit = bit.replace("/", "\\/")
                # Validate the term before add it to the ElasticSearch's request
                if re.match('[\w\d_-]+', word):
                    if '.' in field_name:
                        kwargs = [field_name]
                    else:
                        kwargs = [
                            field_name + '^3',
                            field_name + '.partial^2',
                            field_name + '.partial_back',
                            field_name + '.partial_middle'
                        ]
                    query = {bit: kwargs}
                    query_bits.append(query)
        if len(query_bits):
            results = clone.query.backend.custom_search(query_bits,
                                                        status=status,
                                                        is_private=is_private,
                                                        is_library=is_library)
            clone.query._results = results.get('results', [])
            clone.query._hit_count = results.get('hits', 0)
            return clone.query.get_results()
        return []
maltem-za commented 8 years ago

Have a look at https://github.com/Jiydam/haystack-elasticsearch-raw-query

rogaha commented 8 years ago

Awesome, thanks for sharing @maltem-za

gamesbrainiac commented 7 years ago

Is this yet fixed?

barseghyanartur commented 7 years ago

@gamesbrainiac:

I doubt if would ever be fixed. Haystack is very slow in adapting new features. Even crucial must haves, like ElasticSearch 2.0 integration is taking so long to implement.

It has been a good library in past and is something most of us know how to use, but at the moment, I would think twice before using haystack in new projects at all.

acdha commented 7 years ago

Remember that Haystack is an entirely volunteer project. If something hasn't been implemented it usually means that nobody has volunteered to do it or, as in the case of ES2, get a pull request up to mergeable quality.

In this case, it hits one of the greatest challenges Haystack has which is that search engines are not as similar as SQL databases. If I needed this, I'd make a raw query against the backend connection as Holger suggested since you're already tying your app tightly to ES.

gamesbrainiac commented 7 years ago

@acdha I understand that, but what is the best way currently to do that? Is it to use backend.conn.search or would it rather be better to use the elasticsearch package, and use the DSL to query, therefore just using haystack as a means to push data to ES and not to search it.

acdha commented 7 years ago

@gamesbrainiac Using backend.conn.search avoids the need to maintain configuration in multiple places if you're using more than one backend server. The other question I'd ask is response formats: Haystack is designed to return ORM-like results and so you might reasonably set a policy that anything you can easily do using the SearchQuerySet interface stays there but once you start needing more complexity you move to the full full native query interface.