censusreporter / census-api

The home for the API that powers the Census Reporter project.
MIT License
166 stars 50 forks source link

Search improvements from master #81

Closed scott2b closed 4 years ago

scott2b commented 4 years ago

Changes to address search-related issues/improvements documented in the Census Reporter search improvements (private link) document.

In particular:

"poverty" and "education" use cases

The poverty use case documents poorly ranked profile searches (such as "poverty" and "education") which relates to partial word searches covered later in the document. This is addressed by using to_tsquery in lieu of plainto_tsquery for profile searches.

Note that the current approach boosts some of the tables documented (for education: C15002, C15003) while not others (B15001, B14001). It is not clear at this time if this is a shortcoming with the approach, or lack of data in my development environment. Further investigation with a full dataset may be required.

Full word suggestions

As with the profile use cases above, the table query has been modified to use to_tsquery in order to search on partial word matches. Note that there are use cases discussed (e.g. "9th district" here: https://github.com/censusreporter/censusreporter/issues/212) which may not yet be fully addressed by this approach due, presumably, to differences in lemmatization vs real-world usage.

To further address priority concerns in location searches, the existing concept of priority has been tweaked and used for ordering of table searches, and a script provided to update priority based on usage logs.

PR notes

This PR replicates work that was done in a branch from the docker-stack PR. In order to streamline the review process, the search related work was replicated into the current branch from master.

For whatever reason, some invasive changes in the former working branch were not needed here. I suspect differences in dependencies between my virtualenv and my container stack. Currently, dependencies are not pinned in the requirements.txt file. Consideration should probably be made to do so.

JoeGermuska commented 4 years ago

This is clearly already an improvement. Thank you.