有关BuildData使用中文样本途中遇到的问题

jarork commented 3 years ago

我模仿NYT_star的数据格式把中文样本转换为模型要求的格式，但是发现模型无法标注出实体和关系，ent2id.json和rel2id.json里面都是空的{}。我的数据格式如下：

[ { "text": "急诊胸部CT：临床提示：胸闷头痛3天扫描层厚：5mm影像所示：两下肺少许渗出，两侧胸腔微量积液。无明显气管、支气管异物；无明显食管异物；无气胸、液气胸征象；无明显纵隔气肿、占位；无明显心脏、大血管形态改变，无明显心包积液。（所示肋骨）无明显肋骨错位性骨折。", "triple_list": [ [ "微量", "修饰", "积液" ], [ "两侧", "修饰", "胸腔" ], [ "胸腔", "修饰", "积液" ], [ "少许", "修饰", "两下肺" ], [ "两下肺", "修饰", "渗出" ] ] }, ...... ]

BuildData配置文件build_data_config.yaml中我的设置是：

exp_name: deepwise # nyt_star, nyt, webnlg_star, webnlg, ace05_lu data_in_dir: ../datasets/ori_data ori_data_format: casrel # casrel (webnlg_star, nyt_star), etl_span (webnlg), raw_nyt (nyt), tplinker (see readme)

encoder: BERT bert_path: ../../pretrained_models/chinese-bert-wwm-ext-hit-pytorch-huggingface data_out_dir: ../datasets/train_data/debugging

add_char_span: true ignore_subword: true separate_char_by_white: false check_tok_span: true

中文BERT模型我用的是哈工大的wwm

下载地址是：https://huggingface.co/hfl/chinese-bert-wwm-ext

131250208 commented 3 years ago

@jarork

jarork commented 3 years ago

@jarork

非常感谢，修改后问题已经解决了~

nlper01 commented 7 months ago

我模仿NYT_star的数据格式把中文样本转换为模型要求的格式，但是发现模型无法标注出实体和关系，ent2id.json和rel2id.json里面都是空的{}。我的数据格式如下：

[ { "text": "急诊胸部CT：临床提示：胸闷头痛3天扫描层厚：5mm影像所示：两下肺少许渗出，两侧胸腔微量积液。无明显气管、支气管异物；无明显食管异物；无气胸、液气胸征象；无明显纵隔气肿、占位；无明显心脏、大血管形态改变，无明显心包积液。（所示肋骨）无明显肋骨错位性骨折。", "triple_list": [ [ "微量", "修饰", "积液" ], [ "两侧", "修饰", "胸腔" ], [ "胸腔", "修饰", "积液" ], [ "少许", "修饰", "两下肺" ], [ "两下肺", "修饰", "渗出" ] ] }, ...... ]

BuildData配置文件build_data_config.yaml中我的设置是：

exp_name: deepwise # nyt_star, nyt, webnlg_star, webnlg, ace05_lu data_in_dir: ../datasets/ori_data ori_data_format: casrel # casrel (webnlg_star, nyt_star), etl_span (webnlg), raw_nyt (nyt), tplinker (see readme) encoder: BERT bert_path: ../../pretrained_models/chinese-bert-wwm-ext-hit-pytorch-huggingface data_out_dir: ../datasets/train_data/debugging add_char_span: true ignore_subword: true separate_char_by_white: false check_tok_span: true

中文BERT模型我用的是哈工大的wwm

下载地址是：https://huggingface.co/hfl/chinese-bert-wwm-ext

大佬，你这是公开数据集吗？看着像cmeie，能否分享一份处理好的数据或者数据处理代码，我的格式跟你一致，转换出来一直有问题

131250208 / TPlinker-joint-extraction

有关BuildData使用中文样本途中遇到的问题 #38