blugelabs / bluge

indexing library for Go
Apache License 2.0
1.9k stars 125 forks source link

MatchPhraseQuery always requires position information #86

Open mschoch opened 2 years ago

mschoch commented 2 years ago

"MatchPhraseQuery always requires position information" sounds reasonable, but there is at least one case in which this can lead to confusing behavior.

Sometimes, the phrase you pass in analyzes to just a single term, and when that happens, logically the position information isn't needed, because any match of this term should count, there are no other terms which must exist relative to it. However, currently the implementation doesn't handle this, and the query fails.

For users this is confusing when the same document will satisfy the MatchQuery, but not a MatchPhraseQuery on the same term.

MatchQuery("term") - ✔️
MatchPhraseQuery(term) - ❌

NOTE: this can indirectly occur in cases where a user has added double-quotes to a query string query. This technique can be used to force a text match, but in this case it also can break behavior when the position information has not been indexed.

Short-term workaround:

Be sure to use the SearchTermPositions() option on fields where you would like to use MatchPhraseQuery.

Possible improvements: