Open NLPCode opened 3 years ago
Thanks for pointing out! We will check this out.
Thanks for pointing this out. As the POS part requires word instead of subword, we do shortcut here to use split instead of tokenizer to avoid further matching index between word and subword. We will try to correct this inlanders later version
@dreasysnail @guoyinwang In case we use WORD to split text when we prepare the training data. During training process, I want to use subwords to encode the text. How do we align the pair of text for training.
I think there is an error at line 444 of generate_training_data.py. It should be: tokens = tokenizer.tokenize(line)