131250208 / TPlinker-joint-extraction

438 stars 94 forks source link

tplinker_plus.py 中的decode_rel有错误 #72

Open ZJL0111 opened 2 years ago

ZJL0111 commented 2 years ago

感谢作者分享代码,在利用训练好该模型进行预标注的过程中,发现tplinker_plus.py 中的decode_rel有错误

head link for sp in matrixspots: ........... # recover the positons in the original text_ for ent in ent_list: ent["char_span"] = [ent["char_span"][0] + char_offset, ent["char_span"][1] + char_offset] ent["tok_span"] = [ent["tok_span"][0] + tok_offset, ent["tok_span"][1] + tok_offset]

实体的span恢复,应该放在上述循环外,否则解码会出错,例如下

文本总长2001,输出实体的char_pan却出现了[2853, 2866]这种,,,

'relation_list': [{'subject': 'SAR444245', 'object': 'every 3 weeks', 'subj_tok_span': [405, 410], 'obj_tok_span': [418, 421], 'subj_char_span': [1165, 1174], 'obj_char_span': [1193, 1206], 'predicate': '/Drug/FREQUENCY/Drug-FREQUENCY'}], 

'entity_list': [
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [663, 668], 'char_span': [2326, 2335]}, 
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [669, 676], 'char_span': [2340, 2353]}, 
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [705, 710], 'char_span': [2455, 2464]}, 
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [711, 718], 'char_span': [2469, 2482]}, 
{'type': 'FREQUENCY', 'text': 'every 3 weeks', 'tok_span': [718, 721], 'char_span': [2483, 2496]}, 
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [746, 751], 'char_span': [2583, 2592]}, 
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [752, 759], 'char_span': [2597, 2610]}, 
{'type': 'FREQUENCY', 'text': 'every 3 weeks', 'tok_span': [759, 762], 'char_span': [2611, 2624]}, 
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [793, 800], 'char_span': [2725, 2738]}, 
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [834, 841], 'char_span': [2853, 2866]}]}