etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.21k stars 129 forks source link

Possible to search with partial match? #282

Open Pulkit-Sharma opened 3 years ago

Pulkit-Sharma commented 3 years ago

I love this library and have been using it on all websites. However, there is one issue that need solution.

If I search 'example', result is given. If I search 'some extra text example' no results given.

Is it possible to get result even if a single word matches instead of full words matching?

etianen commented 3 years ago

It's not possible with the library as-is, and TBH I'm not sure you want it. Here's why:

The problem with OR-ing together the words is that, every time you add a new word, you get a larger number of less-specific results. Which is the opposite of how search usually works.

Imagine you're googling.

You type: "pie". Your results are all sorts of pies. You type "chicken pie". Your results are all sorts of pies, and information on how to rear chickens, all mixed together. You type "chicken pie recips". Your results are now all sorts of pies, information on how to rear chickens, and recipies for cooking all types of food, all mixed together.

It's really strange to type in more words and get less and less relevant results!

On Fri, 26 Mar 2021 at 14:06, Pulkit @.***> wrote:

I love this library and have been using it on all websites. However, there is one issue that need solution.

If I search 'example', result is given. If I search 'some extra text example' no results given.

Is it possible to get result even if a single word matches instead of full words matching?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCHDLWGJCIWAOZWZP4LTFSIH3ANCNFSM4Z3OWGVQ .

Pulkit-Sharma commented 3 years ago

It's not possible with the library as-is, and TBH I'm not sure you want it. Here's why: The problem with OR-ing together the words is that, every time you add a new word, you get a larger number of less-specific results. Which is the opposite of how search usually works. Imagine you're googling. You type: "pie". Your results are all sorts of pies. You type "chicken pie". Your results are all sorts of pies, and information on how to rear chickens, all mixed together. You type "chicken pie recips". Your results are now all sorts of pies, information on how to rear chickens, and recipies for cooking all types of food, all mixed together. It's really strange to type in more words and get less and less relevant results! On Fri, 26 Mar 2021 at 14:06, Pulkit @.***> wrote: I love this library and have been using it on all websites. However, there is one issue that need solution. If I search 'example', result is given. If I search 'some extra text example' no results given. Is it possible to get result even if a single word matches instead of full words matching? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#282>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCHDLWGJCIWAOZWZP4LTFSIH3ANCNFSM4Z3OWGVQ .

The thing is google know that a site posts recipe even if a post does not contain "recipe" word. for example, in my blog if I post "how to make pie" and in details only process is given without including recipe word.

If user search "recipe of pie" in watson, no results will be shown. How do we solve it?

etianen commented 3 years ago

To solve that, you need a human-language-aware search engine. One that knows "recipe" means "how to make", and is aware of several common mispellings of "recipe", such as "ressipy". You need Google, in other words, or a more sophisticated search engine like elasticsearch.

django-watson is a database-agnostic frontend to a common subset of full text search available in postgresql and mysql. It can't exceed the capabilities of it's underlying search implementation.

Unfortunately, a simple approach such as OR-ing the search terms rather than AND-ing them won't improve the search quality.

On Sun, 4 Apr 2021 at 18:08, Pulkit @.***> wrote:

It's not possible with the library as-is, and TBH I'm not sure you want it. Here's why: The problem with OR-ing together the words is that, every time you add a new word, you get a larger number of less-specific results. Which is the opposite of how search usually works. Imagine you're googling. You type: "pie". Your results are all sorts of pies. You type "chicken pie". Your results are all sorts of pies, and information on how to rear chickens, all mixed together. You type "chicken pie recips". Your results are now all sorts of pies, information on how to rear chickens, and recipies for cooking all types of food, all mixed together. It's really strange to type in more words and get less and less relevant results! … <#m2652553586001686186> On Fri, 26 Mar 2021 at 14:06, Pulkit @.***> wrote: I love this library and have been using it on all websites. However, there is one issue that need solution. If I search 'example', result is given. If I search 'some extra text example' no results given. Is it possible to get result even if a single word matches instead of full words matching? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#282 https://github.com/etianen/django-watson/issues/282>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCHDLWGJCIWAOZWZP4LTFSIH3ANCNFSM4Z3OWGVQ .

The thing is google know that a site posts recipe even if a post does not contain "recipe" word. for example, in my blog if I post "how to make pie" and in details only process is given without including recipe word.

If user search "recipe of pie" in watson, no results will be shown. How do we solve it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/282#issuecomment-813066430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCBVC6MUC7C55WUBADLTHCMG5ANCNFSM4Z3OWGVQ .

gawry commented 1 year ago

Perhaps some sort of semantic vector db structure could help in this direction (probably out of scope of this library).

If your are using PostgreSQL, https://github.com/pgvector/pgvector might be something interesting to look at as a path into that direction