Is your feature request related to a problem? Please describe.
I want to use a multilingual model from Huggingface ( https://huggingface.co/intfloat/multilingual-e5-large ) and the tokenizer is a sentencepiece unigram tokenizer, so I am unable to port it to C#/ONNX
Describe the solution you'd like
Support for the unigram sentencepiece tokenizer in the Microsoft.ML.Tokenizers package.
Describe alternatives you've considered
Blingfire, but seems not maintained anymore and unclear if it would return exactly the same token-id's.
Thank you for your time and effort (the library in general is great!)
Is your feature request related to a problem? Please describe. I want to use a multilingual model from Huggingface ( https://huggingface.co/intfloat/multilingual-e5-large ) and the tokenizer is a sentencepiece unigram tokenizer, so I am unable to port it to C#/ONNX
Describe the solution you'd like Support for the unigram sentencepiece tokenizer in the
Microsoft.ML.Tokenizers
package.Describe alternatives you've considered Blingfire, but seems not maintained anymore and unclear if it would return exactly the same token-id's.
Thank you for your time and effort (the library in general is great!)