Closed kwiat1990 closed 2 years ago
Hello @kwiat1990 , thank you for opening this issue! Sadly I cannot reproduce the issue you are describing with Strapi v4.1.11 and the plugin at v1.9.0.
Tested on localhost with the config from the docs: http://localhost:1337/api/fuzzy-search/search?query=beau
Response:
{
"authors": [],
"books": [
{
"id": 1,
"title": "The Rising Star",
"description": "The Rising Star is a beautiful book written by John Doe.",
"createdAt": "2022-05-05T13:08:43.816Z",
"updatedAt": "2022-05-05T13:24:07.107Z",
"publishedAt": "2022-05-05T13:22:23.764Z",
"locale": "en"
}
]
}
Would you mind sharing your config so I can have a look? Maybe something went wrong there.
Also now that I see it, you wrote you are querying for search=dolor
? You would have to query the endpoint like this: search?query=dolor
@DomDew hey, sorry, I've made a typo. I called a correct endpoint like so: http://localhost:1337/api/fuzzy-search/search?query=rem
and this gave me no matches.
Here's my complete plugin.js
file (perhaps order does matter?):
module.exports = ({ env }) => ({
"fuzzy-search": {
enabled: true,
config: {
contentTypes: [
{
uid: "api::article.article",
modelName: "article",
queryConstraints: {
where: {
$and: [
{
publishedAt: { $notNull: true },
},
],
},
},
fuzzysortOptions: {
characterLimit: 1900,
threshold: -600,
limit: 10,
allowTypo: true,
keys: [
{
name: "title",
weight: 100,
},
{
name: "content",
weight: -100,
},
],
},
},
],
},
},
placeholder: {
enabled: true,
config: {
size: 10,
},
},
slugify: {
enabled: true,
config: {
contentTypes: {
article: {
field: "slug",
references: "title",
},
category: {
field: "slug",
references: "name",
},
},
},
},
upload: {
config: {
provider: "cloudinary",
providerOptions: {
cloud_name: env("CLOUDINARY_NAME"),
api_key: env("CLOUDINARY_API_KEY"),
api_secret: env("CLOUDINARY_SECRET"),
},
actionOptions: {
uploadStream: {
folder: env("CLOUDINARY_FOLDER", ""),
},
delete: {},
},
},
},
transformer: {
enabled: true,
config: {
responseTransforms: {
removeAttributesKey: true,
removeDataKey: true,
},
},
},
});
@kwiat1990 try lowering the threshold to a lower value (like -1000) and I think this should solve your issue. I tried with your condfig and found that in your example the sorting algorithm returns a score of -504, that is then lowered further by the weighting to -604, so it just barely falls out of scope 😅 . The algorithm naturally weighs matches at the beginning of strings and individual "chunks" higher.
You could lower the value even further if you want "fuzzier" matches and then adjust it to a higher value once you feel that performance takes a hit.
Feel free to let me know if this solved your issue!
Indeed, setting threshold
to -5000
have partly helped to get some matches for those queries. Oddly enough I could get match for substring like rem
or ty
but not for a whole word consectetur
or one of its substrings.
Btw. is it possible to match some queries, which doesn't contain diacritics? Ideally something like zolty
should match string żółty
. Right now it doesn't work.
Setting threshold
to -10000
make it possible to get some results for consectetur
but they're not fully accurate anymore. With this setting I get also multiple results for zolty
, however only one article contains this phrase.
@kwiat1990 thats a bit odd... I'm using the following config based off of your example and i get hits for consectetur
, as well es for some really botched input as in: constetru
and substrings like tet
{
uid: "api::article.article",
modelName: "article",
queryConstraints: {
where: {
$and: [
{
publishedAt: { $notNull: true },
},
],
},
},
fuzzysortOptions: {
characterLimit: 1900,
threshold: -2000,
limit: 10,
allowTypo: true,
keys: [
{
name: "title",
weight: 100,
},
{
name: "content",
weight: -100,
},
],
},
},
Something to generally keep in mind though is that the longer an input string, the more unpredictable a fuzzy search may become. I decided on fuzzysort to handle the search algorithm and of all the different things I tested for sorting/searching this yielded the best results for me - meaning limitations that come with this plugin for strapi come with the fuzzysort package as well.
As for characters with diacritics or letters like the ł sadly right now there is no way to match them to queries that dont contain these letters. Though I think this would be a pretty cool feature to support. Would you feel like you would be up for the challenge to make a pull request to implement this feature?
I did some playing around and I think we could introduce something like a "unicode conversion" mode to convert characters with unicode counterparts into unicode. I think this would be as simple as to call srt.normalize("NFD").replace(/\p{Diacritic}/gu, "")
on the input query as well as the strings to search through.
It gets infinitely trickier though with letters like ł
or Ø
we would have to transliterate into "handcrafted" counterparts like "l" or "o". In that case I think there would be no way around to come up with a hash map for transliterations and then for each of these transliterations iterate over the given strings to replace these letters. Would you see another way?
As a quick example: The replace(/\u0142/g, "l")
part would need to be automated for each given transliteration pair...
"Złótî".replace(/\u0142/g, "l").normalize("NFD").replace(/\p{Diacritic}/gu, "") // --> Zloti
(Sorry for butchering the word Złoty
like this 😄 )
Edit: wording
Hey @DomDew, for some reason with your config neither constetru
nor the whole word gave me matches. I'm really surprised with the differences in search results on your side and on my. Did you used the whole config with other plugins as well? I could only think of some other setting, which breaks the search or is a rich-text field a problem for the fuzzy search (content
is a markdown string)?
Edit: I have added another field with string only content (no markdown) and constetru
gave me results. The same for żółty
and any letter from the beginning, so ż
, żó
and żół
work, but ółt
, łty
or ty
doesn't. Am I right, that to be able to get some matches for this kind of substrings I would need to make the search fuzzier?
My plan was to use MeiliSearch but I found this plugin on the Strapi marketplace and I wanted to give it a shot as I don't quite need very advanced search engine and Fuzzy Search seems to be less overhead than the alternative. Aside from my struggles with matches, I would say that for my needs I could somehow extend the plugin to handle search queries without Polish diacritics. But a generic solution to support many languages? I personally have no experience with such string manipulation and I didn't find a straightforward way to do that with new EcmaScript APIs. Perhaps one could look what for example MeiliSearch did?
Hey @kwiat1990 thats also what confuses me. I'm using no other plugins for the test and the content field I'm searching through is a rich-text field with the string you provided...
Yes, you would need to make the search fuzzier to get matches for this.
I was just in the same position as you when I developed this plugin. Meilisearch seemed like too much overhead for what I wanted to achieve - but I think that for the issues you are facing right now something like meilisearch maybe the more suitable tool. I will try to have a look at how they achieve unicode conversions/transliterations and see if I can whip up some solution.
@DomDew, maybe it's because of some other plugin in the config file or even just the order influences the behavior in this unpredictable way. For now, I've setup the Meili Search but I find you plugin a nice solution and I'll try to find out what's the problem.
Hey @kwiat1990,
I got around to actually implementing a feature that allows for a "transliterated" search (or whatever would be the correct terminology).
I prereleased a version 1.10.0-beta.1 that you could check out. Any feedback and a note on your experience with the feature would be awesome!
You can allow transliteration for a contentType by setting transliterate: true
in the settings for a contentType
.
Hey @kwiat1990,
I got around to actually implementing a feature that allows for a "transliterated" search (or whatever would be the correct terminology).
I prereleased a version 1.10.0-beta.1 that you could check out. Any feedback and a note on your experience with the feature would be awesome!
You can allow transliteration for a contentType by setting
transliterate: true
in the settings for acontentType
.
In the following week I will definitely try it out and give you my feedback.
@kwiat1990 I'll close this issue as the transliteration feature is released in 1.10.2, feel free to open another issue if you want to provide feedback or if things aren't working for you as expected! 😊
@DomDew hey, I had finally a little bit time to play with the new version of the fuzzy search. Sadly I encounter some problems while searching rather simple queries in my case such as dolor
.
I copied the config from docs and it looks like this:
"fuzzy-search": {
enabled: true,
config: {
contentTypes: [
{
uid: "api::article.article",
modelName: "article",
transliterate: true,
queryConstraints: {
where: {
$and: [
{
publishedAt: { $notNull: true },
},
],
},
},
fuzzysortOptions: {
characterLimit: 3000,
threshold: -600,
limit: 10,
keys: [
{
name: "title",
weight: 100,
},
{
name: "lead",
weight: -100,
},
{
name: "content",
weight: 100,
},
],
},
},
],
},
}
As soon I search for terms included in titles, e.g. lo
or lorem
, I get hits:
{
"id": 1,
"title": "Lorem ipsum #1",
"lead": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean pulvinar vulputate nisl, nec condimentum urna imperdiet eu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Integer at leo nec metus mattis imperdiet vitae in tellus. Nulla aliquet placerat interdum. Aliquam ut iaculis nisl, vel libero.",
"slug": "lorem-ipsum-1",
"content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus quis lorem molestie, vehicula magna non, viverra erat. Quisque mollis metus vel enim bibendum sollicitudin vel eget dui. Aliquam tellus diam, tristique eget malesuada sit amet, vulputate pharetra magna. Aenean et rutrum magna. Cras euismod est ligula, nec egestas tortor finibus non. Quisque elementum velit nec posuere sagittis. Suspendisse nec blandit eros, vel lacinia justo. Nulla tempor dictum augue eget condimentum. Maecenas ac dapibus mauris. Curabitur semper sapien nec sapien posuere, nec malesuada urna elementum. Vestibulum sed interdum nunc. Proin sit amet nisi semper, rhoncus lectus ut, ultricies felis. Fusce velit mi, suscipit ac consequat at, molestie eget"
}
But for other fields the search doesn't really work and I get no hit for queries like dolor
, sit
or any other not included in title field.
Hey @kwiat1990 thanks for checking out the new version! I will have a look later to see where things may go wrong...
Hey, I have installed the plugin on my Strapi instance (
v4.1.11
) and apply the very same config as in the docs. After everything was ready, I hit the first query and bingo. I got matches.My content consists of bunch lorem ipsum paragraphs and I was playing with the search queries I've noticed some odd behaviour.
Let's say for the following content:
I'm getting matches with queries like
lorem
,lor
,lo
or evenl
. But if I query some other word likeconsectetur
,dolor
orrem
I don't get any matches. I created new content and even rebuilt the amin panel but I still get the same results or to be precise I get none for any other search queries. For me, somehow, the plugin search only the very first word in the keys defined in the config. If I change thetitle
from the above content type toDolor lorem ipsum #1
, then with a search querysearch=dolor
I get a match.