Closed fgregg closed 6 years ago
One interesting complication to sorting by relevance that we've found in the process of working on https://github.com/datamade/nyc-council-councilmatic/pull/120:
When you're looking for a specific bill, sorting by relevance is usually the only way to find it. As I see it, his is the primary problem raised in this issue. e.g. searching for Resolution 815-2014
returns the wrong result when Date
is the ordering:
Using Resolution
to order it, however, returns the right bill immediately:
However, when searching for broad keywords like Taxi
, the opposite is true! In these cases, sorting by Date
shows the freshest bills on that topic:
While sorting by Relevance
shows bills that may in fact have many mentions of Taxi
in their index, but are from so long ago that they probably aren't relevant to a casual user:
I think the issue here is getting at defining what relevance means to different users. Casual users of the site may find recent bills to be more relevant; power users may want to find specific bills, and so find "exactness and number of word matches" to be more relevant.
A couple of ways forward:
Boy this is a tough one since the "power users" are actually our clients.
So, one way that we have tried to address this in the past is through boosting.
http://django-haystack.readthedocs.io/en/master/boost.html
Can we boost "freshness" so that we a "relevance" search returns pretty good results for the Taxi case and for the Resolution 815-2015 case?
We encourage the user to search for Resolution 815-2015 but do not return this result. This is likely because when we search we are not automatically sorting by relevance. We should sort by relevance when people enter in search terms.