tokenize_style=char的问题

我利用网盘下载了中文语料，设置tokenize_style=char，在pretrain_task.py文件71行和232行： string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"","：","、","，","）","（"]] string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"", "：", "、", "，", "）", "（"]] 可能也需要根据开关设置不同的处理方式： string_list = [x for x in sentence.strip() if x and x not in ["\"", "：", "、", "，", "）", "（"]]

非常感谢你的工作。

brightmart / bert_language_understanding

tokenize_style=char的问题 #10