Open SoluMilken opened 7 years ago
import re store_dict = {replacement: []} store_dict[replacement] = re.findall(pattern, sentence) filtered_sentence = re.sub(.....) tokenized_sentence = tokenizer.lcut(filtered_sentence) for idx, token in enumerate(tokenized_sentence): if token == replacement: tokenized_sentence[idx] = store_dict[replacement][0] store_dict[replacement] = store_dict[replacement][1:]
RRRR 還在想 如果要characterize level 這種句子該怎麼斷 ex: 我的電話號碼是phone 豪想把jieba的字庫清空