org.apache.lucene.analysis.ngram.NGramTokenizer removes whitespace, making a search for literal strings like " test" and "test " equivalent to "test". Searching with relevant whitespace is sometimes desired, particularly where ngrams are used.
This could be fixed by either removing .trim() from the line shown below, or by providing a flag to specifically set trimming behaviour (keeping trim=true as the default so that existing code using this analyzer is not broken).
111: inStr = new String(chars).trim(); // remove any trailing empty strings
Migrated from LUCENE-3979 by David Mason
Environment:
I'm happy to submit a patch for this, but haven't done so for this or similar projects so will take a while to go through the wiki and get set up to make a patch.
org.apache.lucene.analysis.ngram.NGramTokenizer removes whitespace, making a search for literal strings like " test" and "test " equivalent to "test". Searching with relevant whitespace is sometimes desired, particularly where ngrams are used.
This could be fixed by either removing .trim() from the line shown below, or by providing a flag to specifically set trimming behaviour (keeping trim=true as the default so that existing code using this analyzer is not broken).
111: inStr = new String(chars).trim(); // remove any trailing empty strings
Migrated from LUCENE-3979 by David Mason Environment: