Helsinki-NLP / Tatoeba-Challenge

Other
809 stars 91 forks source link

About MarianTokenizer #17

Closed hieutt99 closed 3 years ago

hieutt99 commented 3 years ago

Im sorry i have to create an issue bc I found nowhere the specific model information of the tokenizer as well as the emails to contact from this repo... So what exactly the model type used in the sentencepiece? unigram as default (according to sentencepiece repo) or bpe? tks.