issues
search
bheinzerling
/
bpemb
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k
stars
101
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Support for python 3.12 and scipy-1.13.0
#71
sjschmid
opened
4 months ago
0
Model Downloading 404 Error
#70
gokdumano
opened
5 months ago
0
release tags
#69
ViZiD
opened
6 months ago
0
Error in URL
#68
davebulaval
opened
6 months ago
0
util: make content header check more robust
#67
stefan-it
closed
6 months ago
0
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
#66
srolskyi
opened
6 months ago
7
Add project urls and move the metadata into `pyproject.toml` according to `PEP 621`.
#65
KOLANICH
opened
1 year ago
0
Extends embed to allow sequence of texts
#64
maxi-marufo
opened
1 year ago
0
SSLError
#63
davebulaval
closed
1 year ago
2
Is the training procedure open?
#62
utrobinmv
closed
2 years ago
0
Incompatibility with subword-nmt 3.8.0
#61
AghilesAzzoug
closed
2 years ago
0
EOFError: Compressed file ended before the end-of-stream marker was reached
#60
JulesBelveze
closed
2 years ago
2
en.wiki.bpe.op or en.wiki.bpe.vs
#59
zhenpingli
closed
1 year ago
0
Truecase supported.
#58
BrightXiaoHan
closed
1 year ago
1
How to decode encoded byte-pair sentences?
#57
chayan-dhaddha
closed
2 years ago
1
Issues after updating to gensim 4.0.0
#56
arun5309
closed
3 years ago
1
Are the word embedding glove or word2vec
#55
YoadTew
closed
3 years ago
1
[Question] How can we use BPEmb for large documents?
#54
neel04
closed
3 years ago
2
adding special tokens to a BPEmb model
#53
tannonk
closed
3 years ago
8
Can I add <pad>?
#52
Randool
closed
3 years ago
1
special tokens not handled
#51
dunovank
closed
3 years ago
2
Fix type hints for 'ids' type (fix #49)
#50
cosine0
closed
3 years ago
4
Incorrect type hints for encode_ids*
#49
cosine0
closed
3 years ago
0
Load custom Word2Vec
#48
Delphine22
closed
3 years ago
1
Number isseue in bpemb
#47
aimanmutasem
closed
4 years ago
1
How to use BPEmb as pre-trining model
#46
aimanmutasem
closed
4 years ago
0
UNK words in the prediction output
#45
aimanmutasem
closed
4 years ago
2
EOFError: Compressed file ended before the end-of-stream marker was reached
#44
aimanmutasem
closed
4 years ago
4
Vocabulary size issue
#43
aimanmutasem
closed
4 years ago
2
Update the pypi package
#42
mauryaland
closed
4 years ago
1
Encode with EOS: change function call
#41
hubertkarbowy
closed
4 years ago
1
setup: ensure utf-8 encoding when reading README.md
#40
hartb
closed
4 years ago
1
Subword vectors to word vector
#39
susmoy-macgill36
closed
4 years ago
1
Adding support for own models
#38
stephantul
closed
4 years ago
3
AttributeError: module 'smart_open' has no attribute 's3'
#37
ssp573
closed
4 years ago
1
Continue training
#36
ericlingit
closed
4 years ago
1
question on https://nlp.h-its.org
#35
jwijffels
closed
4 years ago
4
version of sentencepiece used
#34
jwijffels
closed
4 years ago
4
Is there a way to specify the maximum number of subwords so that I can get an embedding of fixed size?
#33
subrahmanyap
closed
4 years ago
1
Difference between "en.wiki.bpe.vs50000" and "en.wiki.bpe.op50000"
#32
caozhen-alex
closed
5 years ago
2
model/embedding versioning?
#31
aparrish
closed
5 years ago
2
Why do Digits always mapped to zero?
#30
sumyatthitsar
closed
5 years ago
2
Compare embeddings
#29
loretoparisi
closed
5 years ago
2
tokenization only feature
#28
trideeprath
closed
3 years ago
1
most_similar method
#27
trideeprath
closed
5 years ago
1
The index for <unk> is 0, so what about <pad>?
#26
ghost
closed
5 years ago
0
How do you get the embedding/id for the pad token ?
#25
derlin
closed
5 years ago
3
Syntax error while importing
#24
amansrivastava17
closed
5 years ago
1
load vectors from path
#23
alejandrojcastaneira
closed
5 years ago
1
Training customized bpemb
#22
gccome
closed
5 years ago
1
Next