131250208 / TPlinker-joint-extraction

438 stars 94 forks source link

bug report - no offset is added to char span with split long text on model evaluation. #65

Closed jarork closed 2 years ago

jarork commented 2 years ago

in tplinker_plus.py

class HandshakingTaggingScheme(object): decode_rel(...): ... char_span_list = tok2char_span[sp[0]:sp[1] + 1] char_sp = [char_span_list[0][0], char_span_list[-1][1]] ent_text = text[char_sp[0]:char_sp[1]] entity = { "type": ent_type, "text": ent_text, "tok_span": [sp[0] + tok_offset, sp[1] + 1 + tok_offset], # bug fixed. "char_span": [char_sp[0] + char_offset, char_sp[1] + char_offset], # bug fixed. }