Open asfimport opened 8 years ago
Michael McCandless (@mikemccand) (migrated from JIRA)
How does this issue differ from #3679?
Michael McCandless (@mikemccand) (migrated from JIRA)
OK I see: this issue is about making the same fixes in #3679, which was for the classic query parser, to the flexible query parser.
Steven Rowe (@sarowe) (migrated from JIRA)
Yes.
Steven Rowe (@sarowe) (migrated from JIRA)
WIP patch against master, generated files not included (ant javacc-flexible
in lucene/queryparser/
will generate them), still has nocommits and failing tests.
In addition to enabling not splitting on whitespace prior to text analysis, the patch includes the following changes:
TermQueryNode
's positionIncrement
name to position
, since that's what it really holds.SynonymQueryNode
/Builder
now produces a SynonymQuery
instead of a boolean query.AnalyzerQueryNodeProcessor.postProcessNode()
into shorter methods and made it simpler and easier to follow.QueryParserTestBase
.Some challenges remain:
+(word)
-> word
. Some of the split-on-whitespace shared tests will need to be specialized for each parser.FlattenQueryNodeProcessor
meant to address this issue, but it's not working and I haven't figured out why yet.
Copied from #3679:
The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream. This breaks the following at query-time, because they can't see across whitespace boundaries:
n-gram analysis shingles synonyms (especially multi-word for whitespace-separated languages) languages where a 'word' can contain whitespace (e.g. vietnamese)
Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.
Migrated from LUCENE-7315 by Steven Rowe (@sarowe), 2 votes, updated Jul 20 2016 Attachments: LUCENE-7315.patch Linked issues:
3679