apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.65k stars 1.03k forks source link

WFST/AnalyzingSuggest don't handle keys containing 0 bytes correctly [LUCENE-4534] #5600

Closed asfimport closed 11 years ago

asfimport commented 11 years ago

While binary terms w/ 0 bytes are rare, they are "allowed" but will cause exceptions with at least WFST/AnalyzingSuggester.

I think to fix this we should pass custom Comparator to the offline sorter that decodes each BytesRef key and does the actual comparison we want, instead of using separator and relying on BytesRef.compareTo.


Migrated from LUCENE-4534 by Michael McCandless (@mikemccand), resolved Nov 05 2012 Attachments: LUCENE-4534.patch (versions: 2)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch w/ failing test case for WFSTCompletionLookup and AnalyzingSuggester.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch w/ fix.

Basically, instead of relying on sorting a single "packed" byte[], I decode each byte[] into its parts (key/weight/analyzed form) and do the comparison "directly". This is cleaner because we no longer need to rely on separators that then cause 0 bytes to not work...

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

+1!

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1405978

LUCENE-4534: dedup same surface form in Analyzing/FuzzySuggester

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1405963

LUCENE-4534: handle 0 byte values in lookup keys