apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.48k stars 985 forks source link

JapanesePartOfSpeechStopFilterFactory should load built-in stop tags by default [LUCENE-9567] #10607

Open asfimport opened 3 years ago

asfimport commented 3 years ago

If JapanesePartOfSpeechStopFilterFactory is given empty args, it does nothing. It doesn't load any stop tags, and just passes along the TokenStream passed to create().

As a default behavior, this is trappy, since a user may add the filter without explicitly adding any arguments and assume that it would load a "default" stop set. Or they may assume that if an explicit argument is required then an exception will be thrown. Regardless, "doing nothing" is almost certainly not what the user intended.

I'm going to attach a patch to load the default stop tags (using JapaneseAnalyzer.getDefaultStopTags()) if no args are specified, which probably makes sense in 9.0 (as it's consistent with e.g. KoreanPartOfSpeechStopFilterFactory). If we want to apply a fix to 8.x, maybe throw an exception to let the use know that the FilterFactory probably isn't doing what they think it's doing?


Migrated from LUCENE-9567 by Michael Froh (@msfroh), updated Oct 09 2020 Pull requests: https://github.com/apache/lucene-solr/pull/1961

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 4e0aa0d23bbd577c4c96bb56b52d3bb558050c11 in lucene-solr's branch refs/heads/master from msfroh https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4e0aa0d

LUCENE-9567: JPOSSFF loads built-in stop tags by default (#1961)

load stoptags.txt from analysis-kuromoji when no tags argument is specified

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 4e0aa0d23bbd577c4c96bb56b52d3bb558050c11 in lucene-solr's branch refs/heads/master from msfroh https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4e0aa0d

LUCENE-9567: JPOSSFF loads built-in stop tags by default (#1961)

load stoptags.txt from analysis-kuromoji when no tags argument is specified

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 4e0aa0d23bbd577c4c96bb56b52d3bb558050c11 in lucene-solr's branch refs/heads/master from msfroh https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4e0aa0d

LUCENE-9567: JPOSSFF loads built-in stop tags by default (#1961)

load stoptags.txt from analysis-kuromoji when no tags argument is specified

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 4e0aa0d23bbd577c4c96bb56b52d3bb558050c11 in lucene-solr's branch refs/heads/master from msfroh https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4e0aa0d

LUCENE-9567: JPOSSFF loads built-in stop tags by default (#1961)

load stoptags.txt from analysis-kuromoji when no tags argument is specified