Closed WallE-Chang closed 1 year ago
I find the reason. The prepare_for_model funtion of PreTrainedTokenizerBase in paddlenlp and transfromers is different. prepare_for_model in paddlenlp have return offset , but transfromer dosen't . This is the code how paddlenlp handling offser . https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/tokenizer_utils_base.py#L2764
Hi, I read your tokenizer code which is subclass of PretrainedTokenizer. But PretrainedTokenizer of paddlenlp is more similar to PretrainedTokenizerFast of transformers, which means tokenizer can return offset. The code as following