OpenPecha / training_sentencepiece

0 stars 0 forks source link

MT0021: Training a bilingual tokenizer #1

Open TenzinGayche opened 3 months ago

TenzinGayche commented 3 months ago

Description :

Training a Bilingual tokenizer with 32k vocab size and low fertility score using sentencepiece library

To dos:

TenzinGayche commented 3 months ago

Bo_en Tokenizer V1