apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.67k stars 1.03k forks source link

integrate snowball stopword lists [LUCENE-2206] #3282

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

The snowball project creates stopword lists as well as stemmers, example: http://svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt?view=markup

This patch includes the following:

I did not add any changes to SnowballAnalyzer to actually automatically use these lists yet, i would like us to discuss this in a future issue proposing integrating snowball with contrib/analyzers.


Migrated from LUCENE-2206 by Robert Muir (@rmuir), resolved Jan 16 2010 Attachments: LUCENE-2206.patch, LUCENE-2206-checkout-fixes.patch Linked issues:

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

patch with mod to wordlistloader, test, and snowball stoplists for danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, russian, spanish, and swedish

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I will commit this in a few days if no one objects. Again i add the getSnowballWordSet to WordListLoader, but if this is inappropriate we could instead have a SnowballWordListLoader in our snowball package or something, doesn't matter to me.

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Robert, patch looks good except of one thing.

  public static HashSet<String> getSnowballWordSet(Reader reader)

it returns a hashset but should really return a Set<String>. We plan to change all return types to the interface instead of the implementation.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

thanks Simon, I agree

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Committed revision 899955.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Robert,

when i changed the backwards tests i added a new param to svn exec task. With this patch it now behaves equal to bw checkouts:

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Sorry some whitespace issues. Fixed here.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Committed Revision: 900160