Open scarydemon2 opened 4 years ago
I find this bug too and agree with you. I think it should be
if len(query_tokens)+2+offset_idx_dict[int(s_idx)] <= max_seq_length and \
or
if offset_idx_dict[int(s_idx)] <= max_tokens_for_doc and \
.
Apologies for the late reply.
Thanks for pointing out my mistake. Yes, this is a bug introduced when I was trying to clean my codebase. I fixed it in the commit (f80ed26). Please pull the latest repo.
Many Thanks!
https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/0505c263a6a3868713e3abcd29856a931ba1a365/data_loader/mrc_utils.py#L145 may be the "max_tokens_for_doc" should be replaced by "max_seq_length". Because the "doc span pos" matrix is limited by the max_seq_length