Closed ZizhenWang closed 5 years ago
but why do you think it's related to text matching?
It is basic tokenizer of Bert.
if it's a tokenizer, then should be designed as a processor unit.
I think it's more to solve the OOV (out-of-vocabulary) problem. Currently the vocabulary is only built from training data and thus maybe limited by how you choose your training data.
BPE is used many NLP tasks as machine translation, generation and pre-training, we will implement a BPE processor to support these requirements.