SeanLee97 / xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能
Apache License 2.0
1.23k stars 188 forks source link

关于特殊名词 #17

Closed dwbaron closed 3 years ago

dwbaron commented 5 years ago

类似债券简称,比如“02进出04”,特殊名词比如“5G”,我发现在分词的时候会打散

SeanLee97 commented 5 years ago

Thanks for your suggestions, we will fix the problem that can not detect proper noun formed by digital and English alphabet.

dwbaron commented 5 years ago

I tried to combined trie (which perform the exactly match) and hmm seg to fix such problem temporarily.

dwbaron commented 5 years ago

it seems that u first use zh-char to split the sentence, use eng-char seems better?

dwbaron commented 5 years ago

image I try to figure out this en-char problems follow my above solution

SeanLee97 commented 5 years ago

Thanks for your suggestions! I will fix the problem when I free. If you are willing to give contributions to this repo, you can create a PR! Look forward to your contributions!