我利用网盘下载了中文语料,设置tokenize_style=char,在pretrain_task.py文件71行和232行:
string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"",":","、",",",")","("]]
string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"", ":", "、", ",", ")", "("]]
可能也需要根据开关设置不同的处理方式:
string_list = [x for x in sentence.strip() if x and x not in ["\"", ":", "、", ",", ")", "("]]
我利用网盘下载了中文语料,设置tokenize_style=char,在pretrain_task.py文件71行和232行: string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"",":","、",",",")","("]] string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in ["\"", ":", "、", ",", ")", "("]] 可能也需要根据开关设置不同的处理方式: string_list = [x for x in sentence.strip() if x and x not in ["\"", ":", "、", ",", ")", "("]]
非常感谢你的工作。