Closed mjeensung closed 1 year ago
This error was generated because of the NER detokenization issue given a sentence length is larger than the max sequence length (e.g., 128 in this case).
We use the truncated version in NER preprocessing (lines 302-306 in multi_ner/main.py) which could generate the following issue.
Thus, we resolve this issue by changing the code to the sliding window in preprocessing part (lines 308-414 in multi_ner/main.py) and postprocessing part (lines 236-238 in multi_ner/ops.py).
If there are any other problems please let me (@minstar) know and reopen this issue!
An input for the reproduction:
The error message in logs/nohup_multi_ner.out: