etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.2k stars 129 forks source link

Order By Additional Values #119

Closed efgonzalez1 closed 8 years ago

efgonzalez1 commented 9 years ago

I'm not sure if this is a common use case, but I want to add the ability to order search results by rank as well as some of the additional values that I pass to store when I register the model. I was able to get this working, but I think the setup is a little awkward because I had to use extra and I had to parse the meta_encoded column:

queryset = queryset.extra(
    select={'timestamp': "meta_encoded::json->>'timestamp'"},
    order_by=['-timestamp']
) 

Luckily Postgres makes it easy to parse meta_encoded as json and it's easy enough to get the value, but ideally I wouldn't need to write any SQL. Perhaps their's a better way to do this so that we don't have to use extra and instead we could do something like order_by('meta_encoded__timestamp').

etianen commented 9 years ago

I don't think that is this a very good idea.

The whole point of using database-level full-text search is to use efficient indexes that allow the search to perform well over large datasets.

By ordering by the parsed JSON member, you cannot order by an index any more, so it will scale horribly. On Fri, 7 Aug 2015 at 07:23 Esteban Fernando Gonzalez < notifications@github.com> wrote:

I'm not sure if this is a common use case, but I want to add the ability to order search results by rank as well as some of the additional values that I pass to store when I register the model. I was able to get this working, but I think the setup is a little awkward because I had to use extra and I had to parse the meta_encoded column:

queryset = queryset.extra( select={'timestamp': "meta_encoded::json->>'timestamp'"}, order_by=['-timestamp'] )

Luckily Postgres makes it easy to parse meta_encoded as json and it's easy enough to get the value, but ideally I wouldn't need to write any SQL. Perhaps their's a better way to do this so that we don't have to use extra and instead we could do something like order_by('meta_encoded__timestamp') .

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/119.

efgonzalez1 commented 9 years ago

Yea that's a good point. Luckily my dataset is small enough (~200k records) and this method seems to perform perfectly. I didn't do any rigorous testing, but looking at the debug toolbar, all of my sorting methods take virtually the same amount of time. I know it's negligible, but for some queries, sorting by rank is ~100ms slower!

etianen commented 9 years ago

I'm surprised, as that's a fair few records, but glad it works for you!

I guess if the query matches only a few records, not using an index for ordering won't be a problem. On Sat, 8 Aug 2015 at 02:26 Esteban Fernando Gonzalez < notifications@github.com> wrote:

Yea that's a good point. Luckily my dataset is small enough (~200k records) and this method seems to perform perfectly. I didn't do any rigorous testing, but looking at the debug toolbar, all of my sorting methods take virtually the same amount of time. I know it's negligible, but for some queries, sorting by rank is ~100ms slower!

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/119#issuecomment-128879309 .