etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.21k stars 129 forks source link

No results for partial keywords #277

Closed miguellteixeira closed 4 years ago

miguellteixeira commented 4 years ago

Hi, I have the following row on a PostgreSQL database (watson_searchentry):

[
  {
    "content" : "education",
    "meta_encoded" : "{}",
    "id" : 43,
    "content_type_id" : 28,
    "engine_slug" : "default",
    "title" : "education",
    "search_tsv" : "'educ':1A,2",
    "object_id" : "4",
    "description" : "",
    "object_id_int" : 4,
    "url" : ""
  }
]

And when searching for some partial keywords there are no results:

>>> from watson import search as watson

>>> print(watson.search("educ"))
<QuerySet [<SearchEntry: education>, <SearchEntry: continuing education>, <SearchEntry: primary education>, <SearchEntry: music education>, <SearchEntry: vocational education>, <SearchEntry: stem education>, <SearchEntry: secondary education>, <SearchEntry: higher education>, <SearchEntry: education>]>

>>> print(watson.search("educa"))
<QuerySet []>

>>> print(watson.search("educat"))
<QuerySet []>

>>> print(watson.search("educati"))
<QuerySet []>

>>> print(watson.search("educatio"))
<QuerySet []>

>>> print(watson.search("education"))
<QuerySet [<SearchEntry: education>, <SearchEntry: continuing education>, <SearchEntry: primary education>, <SearchEntry: music education>, <SearchEntry: vocational education>, <SearchEntry: stem education>, <SearchEntry: secondary education>, <SearchEntry: higher education>, <SearchEntry: education>]>

I've followed the setup guide and ran all commands needed to populate the Watson search entry but so far couldn't figure it out why this is happening, am I missing something?

etianen commented 4 years ago

This all comes down to postgresql word stemming. I'm not certain it's a very good behaviour, tbh, but it's what we have.

When postgresql sees the word "education", it stems it to "educ". This applies to both your search text and search query. So it's "educ" that's stored in the search index.

When you search for "educ", it gets stemmed to "educ", and matches. When you search for "education", it gets stemmed to "educ", and matches.

When you search for "educa", it does not get stemmed, and "educa" does not match "educ".

You can see this behaviour yourself with the following SQL:

mm=# select to_tsquery('educ');
 to_tsquery 
------------
 'educ'
(1 row)

mm=# select to_tsquery('educa');
 to_tsquery 
------------
 'educa'
(1 row)

mm=# select to_tsquery('educat');
 to_tsquery 
------------
 'educat'
(1 row)

mm=# select to_tsquery('educati');
 to_tsquery 
------------
 'educati'
(1 row)

mm=# select to_tsquery('educatio');
 to_tsquery 
------------
 'educatio'
(1 row)

mm=# select to_tsquery('education');
 to_tsquery 
------------
 'educ'
(1 row)