Closed LittleLittleCloud closed 4 months ago
Is this something that could be more widely applicable beyond SentencePiece?
i.e. See template processing section in this link https://huggingface.co/docs/tokenizers/pipeline#all-together-a-bert-tokenizer-from-scratch
Is your feature request related to a problem? Please describe. The phi-3 uses llama2 tokenizer with a few special tokens like
<|user|>
and<|system|>
. But currently there is no way to add special tokens to sentence piece bpe (the llama 2 tokenizer) in mlnet Describe the solution you'd like A clear and concise description of what you want to happen.Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.