Nov11 / ik-analyzer

Automatically exported from code.google.com/p/ik-analyzer
0 stars 0 forks source link

特殊字符串导致的分词异常NullPointerException #44

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
分词字符串:"重庆康田2.7亿西永拿地 
楼面价2474元/平方按公示,该地块属于沙坪坝区西永组团U分��
�U8-8-1/03地块,土地用途为二类居住用地、商业金融业用地,�
��地面积约72756方,建筑规模要求不超过11万方,起拍价约2.1��
�。"

调用lucene的分词代码:
public static List<String> analysing(String input, boolean useSmart) {
        try {
            // 生成analyzer实例
            Analyzer analyzer = new IKAnalyzer(useSmart);
            // 取得Token流
            Reader reader = new StringReader(input);
            TokenStream stream = analyzer.tokenStream("", reader);
            // 重置到流的开始位置
            stream.reset();
            // 添加工具类
            CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
            // 循环打印所有分词及其位置
            List<String> result = new ArrayList<String>();
            while (stream.incrementToken()) {
                LOG.info(termAtt.toString());
                result.add(termAtt.toString());
            }

            return result;
        } catch (Exception e) {
            LOG.error("分词异常", e);
        }

        return null;
    }

异常信息:
java.lang.NullPointerException
    at org.wltea.analyzer.core.AnalyzeContext.compound(AnalyzeContext.java:382)
    at org.wltea.analyzer.core.AnalyzeContext.getNextLexeme(AnalyzeContext.java:325)
    at org.wltea.analyzer.core.IKSegmenter.next(IKSegmenter.java:116)
    at org.wltea.analyzer.lucene.IKTokenizer.incrementToken(IKTokenizer.java:73)
    at com.test.util.AnalyzerUtil.analysing(AnalyzerUtil.java:77)
    at com.test.util.AnalyzerUtil.main(AnalyzerUtil.java:54)

Original issue reported on code.google.com by fengyunz...@gmail.com on 19 Mar 2012 at 3:47

GoogleCodeExporter commented 8 years ago
其中lucene用的是3.5版本,IK-Analyzer用的是2012版本

Original comment by fengyunz...@gmail.com on 19 Mar 2012 at 3:50

GoogleCodeExporter commented 8 years ago
该bug已经在 2012 upgrade3 中修订,请下载最近版本

Original comment by linliang...@gmail.com on 19 Mar 2012 at 5:57