Open haveamission opened 2 years ago
I think it's possible to replace the default dictionary, as it has a CLI option -D DICT, --dict DICT
which can use DICT
as the dictionary used by jieba. And it seems
jieba.set_dictionary('data/dict.txt.big')
could set dictionary path when loading jieba as a library.
I want to segment ONLY words that are in a limited, custom dictionary. Everything else, I want to separate into individual characters. Is there a way to do this with jieba? Or I must I do this in a post-processing step? load_userdict seems to only add words, not replace current dictionary