apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.74k stars 1.04k forks source link

[PATCH] Bigram based CJK tokenizer(modified from StopTokenizer) [LUCENE-139] #1217

Closed asfimport closed 18 years ago

asfimport commented 21 years ago

/* ====================================================================

package org.apache.lucene.analysis.cjk;

import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.Tokenizer;

import java.io.Reader;

/**

/* ====================================================================

package org.apache.lucene.analysis.cjk;

import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.TokenStream;

import java.io.Reader;

import java.util.Hashtable;

/**


Migrated from LUCENE-139 by Che Dong, resolved May 27 2006 Environment:

Operating System: All
Platform: All

Attachments: ASF.LICENSE.NOT.GRANTED--CJKAnalyzer.java, ASF.LICENSE.NOT.GRANTED--CJKTokenizer.java

asfimport commented 21 years ago

Che Dong (migrated from JIRA)

Created an attachment (id=8418) CJKTokenizer

asfimport commented 21 years ago

Che Dong (migrated from JIRA)

Created an attachment (id=8419) CJKAnalyser: need remove empty token created by CJKTokenizer

asfimport commented 20 years ago

Otis Gospodnetic (migrated from JIRA)

Thank you for the contribution, Che. I have finally added your 2 CJK classes to Lucene's Sandbox. I used the attached versions of your classes, not the inlined ones.