bheinzerling bpemb issues

bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

https://nlp.h-its.org/bpemb

MIT License

1.18k stars 101 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

size/source of training corpora

#21 joemzhao closed 5 years ago
2
numbers/digits conversion

#20 csarron closed 5 years ago
3
multilingual text

#19 rohitsaluja22 closed 5 years ago
2
Error when loading model

#18 Hoiy closed 5 years ago
2
Fix http_get & remove f-strings

#17 sanghoon closed 5 years ago
1
Encoder not splitting words into subwords

#16 SamLynnEvans closed 5 years ago
2
fix typo (install -> import)

#15 jfilter closed 5 years ago
1
Missing tokens in German model

#14 maurice-g closed 5 years ago
3
SentencePiece fails?

#13 gwohlgen closed 6 years ago
2
Train --model_type=unigram

#12 taku910 closed 1 year ago
2
How do you learn the Chinese BPE?

#11 Shuailong closed 6 years ago
2
On-the-fly conversion to subwords in Python

#10 jbingel closed 6 years ago
2
Comparison to other word vectors

#9 DonaldTsang closed 5 years ago
1
No question marks in Russian models

#8 avostryakov closed 5 years ago
3
Vocab length != word vector count

#7 tocab closed 5 years ago
5
Some embeddings are invalid (majority of vectors is inf or nan)

#6 leezu closed 5 years ago
5
Training script

#5 lparam opened 6 years ago
10
Converting to lowercase in preprocess_text.sh works only with ascii characters

#4 pshashk closed 5 years ago
2
<url> symbol is not kept

#3 noe closed 5 years ago
2
Symbols don't match between model and embedding

#2 noe closed 5 years ago
6
Error when executing preprocess_text.sh

#1 noe closed 7 years ago
1