fxsjy / jieba

结巴中文分词
MIT License
33.39k stars 6.73k forks source link

Any way to override the native Jieba dictionary? #980

Open haveamission opened 2 years ago

haveamission commented 2 years ago

I want to segment ONLY words that are in a limited, custom dictionary. Everything else, I want to separate into individual characters. Is there a way to do this with jieba? Or I must I do this in a post-processing step? load_userdict seems to only add words, not replace current dictionary

brynne8 commented 1 year ago

I think it's possible to replace the default dictionary, as it has a CLI option -D DICT, --dict DICT which can use DICT as the dictionary used by jieba. And it seems

jieba.set_dictionary('data/dict.txt.big')

could set dictionary path when loading jieba as a library.