issues
search
adapter-hub
/
hgiyt
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
https://arxiv.org/abs/2012.15613
26
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Calculating the same fertility twice in plot_fertility()
#5
HanNayeoniee
opened
1 year ago
0
Character-tokenized vs subword-tokenized in Japanese
#4
tomohideshibata
closed
2 years ago
1
"--with-charset=utf8" option is needed for the Mecab install
#3
tomohideshibata
closed
3 years ago
2
Version of UD-Treebanks used for Tokenizer Experiments
#2
kabirahuja2431
closed
3 years ago
4
finetuning: add missing use_fast arg to argument parser
#1
stefan-it
closed
3 years ago
2