issues
search
bheinzerling
/
bpemb
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k
stars
101
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
size/source of training corpora
#21
joemzhao
closed
5 years ago
2
numbers/digits conversion
#20
csarron
closed
5 years ago
3
multilingual text
#19
rohitsaluja22
closed
5 years ago
2
Error when loading model
#18
Hoiy
closed
5 years ago
2
Fix http_get & remove f-strings
#17
sanghoon
closed
5 years ago
1
Encoder not splitting words into subwords
#16
SamLynnEvans
closed
5 years ago
2
fix typo (install -> import)
#15
jfilter
closed
5 years ago
1
Missing tokens in German model
#14
maurice-g
closed
5 years ago
3
SentencePiece fails?
#13
gwohlgen
closed
6 years ago
2
Train --model_type=unigram
#12
taku910
closed
1 year ago
2
How do you learn the Chinese BPE?
#11
Shuailong
closed
6 years ago
2
On-the-fly conversion to subwords in Python
#10
jbingel
closed
6 years ago
2
Comparison to other word vectors
#9
DonaldTsang
closed
5 years ago
1
No question marks in Russian models
#8
avostryakov
closed
5 years ago
3
Vocab length != word vector count
#7
tocab
closed
5 years ago
5
Some embeddings are invalid (majority of vectors is inf or nan)
#6
leezu
closed
5 years ago
5
Training script
#5
lparam
opened
6 years ago
10
Converting to lowercase in preprocess_text.sh works only with ascii characters
#4
pshashk
closed
5 years ago
2
<url> symbol is not kept
#3
noe
closed
5 years ago
2
Symbols don't match between model and embedding
#2
noe
closed
5 years ago
6
Error when executing preprocess_text.sh
#1
noe
closed
7 years ago
1
Previous