Closed Hassaan68 closed 9 months ago
It seems that tokenizer.model is the pretrained SentencePiece model. can we have the access to training code of this model so that we can add training to tokenize more modalities??
tokenizer.model is just fir text processing. For other modalities, we use a conv layer to transform input data into tokens.
@csuhan thank you for clarification
It seems that tokenizer.model is the pretrained SentencePiece model. can we have the access to training code of this model so that we can add training to tokenize more modalities??