fxsjy / jieba

结巴中文分词
MIT License
33.39k stars 6.73k forks source link

单个英文单词的关键词提取 #372

Open xiaoyaosheng opened 8 years ago

xiaoyaosheng commented 8 years ago

单个的英文单词不能通过取关键词取到吗?类似于c这个单词,即使把c加入到自定义的词典也不行啊

zhbzz2007 commented 8 years ago

tfidf算法抽取,w就是待抽取的单词,如果单词长度小于2,就不处理;

if len(w.strip()) < 2 or w.lower() in self.stop_words:
    continue

textrank算法抽取,只有单词长度大于等于2,才会进行处理;

def pairfilter(self, wp):
    return (wp.flag in self.pos_filt and len(wp.word.strip()) >= 2
            and wp.word.lower() not in self.stop_words)

两个算法,都会较短的单词(字符数少于2)不进行抽取;

xiaoyaosheng commented 8 years ago

@zhbzz2007 很感谢