blugelabs / bluge

indexing library for Go
Apache License 2.0
1.88k stars 122 forks source link

WildcardQuery do not work? #106

Open willie68 opened 2 years ago

willie68 commented 2 years ago

Just a short question: Is the WildcardQuery already working? Because, i have a doc index with a text field called X-Tenant, value is MCS Search with X-Tenant: MCS will find this doc but X-Tenant: M* will not find anything. Nor MC? or M??

willie68 commented 2 years ago

Just some new aspects: mc* will find the docs. So it seems that if there is a wildcard in place, the term should be in lower case.

mschoch commented 2 years ago

if there is a wildcard in place, the term should be in lower case

That is not universally true, and would depend on the analyzer used for the field.

To debug specific issues of working/not-working it is most helpful to provide a runnable example. Too many behaviors depend on the actual data, and the configuration of the analyzers.

willie68 commented 2 years ago

Hi, Thanks for the answer. I use the default for everything. Nothing special is configured for any field.

Here is some sample code: main.go.txt

In line 61 change code from "BL" tp "bl" and you will find the doc.

mschoch commented 2 years ago

Great, so what is the issue now?

willie68 commented 2 years ago

As you can see, the body contains the text bluge... You will find the doc with bl but not with BL If the body contains "Bluge..." bl will find the doc, too. BL not. Even if the body is "BLuge" BL* will find nothing. So the search term is case sensitive. You will only find if the search phrase is in lower case. ( On the other hand using a MatchQuery is case-insensitiv. Both will find the right doc. query := bluge.NewMatchQuery("BLUGE").SetField("Body") query := bluge.NewMatchQuery("bluge").SetField("Body")

willie68 commented 2 years ago

And even if i'm using querystr.ParseQueryString(query, querystr.DefaultOptions()) to parse the query string into a bluge.Query structure, this will not work. Wildcard search terms must be in lowercase to get a result. Which is difficult for the end user to understand.

mschoch commented 2 years ago

There is a fundamental difference between how match queries and wildcard queries work, but it has to do with the application of an analyzer to the search term. This indirectly leads to the behavior you are seeing, but I will again say it is wrong think that all wildcard searches are lower-case.

When you do the following match queries:

query := bluge.NewMatchQuery("BLUGE").SetField("Body")
query := bluge.NewMatchQuery("bluge").SetField("Body")

The search terms are analyzed, that means the in BOTH cases, we are actually searching the index for bluge. However, consider the following wildcard queries:

query := bluge.NewWildcardQuery("BL*").SetField("Body")
query := bluge.NewWildcardQuery("bl*").SetField("Body")

The wildcard query does NOT analyze the search term. Because the analyzer you are using makes all index terms lower-case, the first query will never match anything. That is not a bug, that is working as expected.

Earlier, you had said, "if there is a wildcard in place, the term should be in lower case". And that is not true generally, it is true because you are using the standard analyzer on your text. If you had used a custom analyzer that did not lower-case all the input, and there were terms with upper-case letters, you could use them in your wildcard pattern, and they would work as expected.

willie68 commented 2 years ago

Ok. I understand. So if I use directly the Query API I have to take the default analyzer into account. Ok Than the only thing is, when I use the querystr.ParseQueryString(query, querystr.DefaultOptions()) (with the default option) to directly convert a user query, it will lead into the same problem. If the user inputs Bl* the search find nothing. What do I have to do than?

mschoch commented 2 years ago

Basically the wildcard search does not directly do what you want, I see 2 choices:

Neither of these changes are easily used by the query string package, so you'd likely have to maintain your own version of that too.

willie68 commented 2 years ago

Sorry I can only reply now. Easter holidays! The quintessence from the user's point of view is that even if I use everything in the default, querystr and the standard analyzer, the two do not work together in a user-friendly way. So I have to exchange one. Hmmm, since I don't really like the syntax of querystr for my application anyway, I'll probably write my own parser. Thank you for your explanations and efforts