bheinzerling bpemb issues

bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

https://nlp.h-its.org/bpemb

MIT License

1.18k stars 101 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Support for python 3.12 and scipy-1.13.0

#71 sjschmid opened 4 months ago
0
Model Downloading 404 Error

#70 gokdumano opened 5 months ago
0
release tags

#69 ViZiD opened 6 months ago
0
Error in URL

#68 davebulaval opened 6 months ago
0
util: make content header check more robust

#67 stefan-it closed 6 months ago
0
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

#66 srolskyi opened 6 months ago
7
Add project urls and move the metadata into `pyproject.toml` according to `PEP 621`.

#65 KOLANICH opened 1 year ago
0
Extends embed to allow sequence of texts

#64 maxi-marufo opened 1 year ago
0
SSLError

#63 davebulaval closed 1 year ago
2
Is the training procedure open?

#62 utrobinmv closed 2 years ago
0
Incompatibility with subword-nmt 3.8.0

#61 AghilesAzzoug closed 2 years ago
0
EOFError: Compressed file ended before the end-of-stream marker was reached

#60 JulesBelveze closed 2 years ago
2
en.wiki.bpe.op or en.wiki.bpe.vs

#59 zhenpingli closed 1 year ago
0
Truecase supported.

#58 BrightXiaoHan closed 1 year ago
1
How to decode encoded byte-pair sentences?

#57 chayan-dhaddha closed 2 years ago
1
Issues after updating to gensim 4.0.0

#56 arun5309 closed 3 years ago
1
Are the word embedding glove or word2vec

#55 YoadTew closed 3 years ago
1
[Question] How can we use BPEmb for large documents?

#54 neel04 closed 3 years ago
2
adding special tokens to a BPEmb model

#53 tannonk closed 3 years ago
8
Can I add <pad>?

#52 Randool closed 3 years ago
1
special tokens not handled

#51 dunovank closed 3 years ago
2
Fix type hints for 'ids' type (fix #49)

#50 cosine0 closed 3 years ago
4
Incorrect type hints for encode_ids*

#49 cosine0 closed 3 years ago
0
Load custom Word2Vec

#48 Delphine22 closed 3 years ago
1
Number isseue in bpemb

#47 aimanmutasem closed 4 years ago
1
How to use BPEmb as pre-trining model

#46 aimanmutasem closed 4 years ago
0
UNK words in the prediction output

#45 aimanmutasem closed 4 years ago
2
EOFError: Compressed file ended before the end-of-stream marker was reached

#44 aimanmutasem closed 4 years ago
4
Vocabulary size issue

#43 aimanmutasem closed 4 years ago
2
Update the pypi package

#42 mauryaland closed 4 years ago
1
Encode with EOS: change function call

#41 hubertkarbowy closed 4 years ago
1
setup: ensure utf-8 encoding when reading README.md

#40 hartb closed 4 years ago
1
Subword vectors to word vector

#39 susmoy-macgill36 closed 4 years ago
1
Adding support for own models

#38 stephantul closed 4 years ago
3
AttributeError: module 'smart_open' has no attribute 's3'

#37 ssp573 closed 4 years ago
1
Continue training

#36 ericlingit closed 4 years ago
1
question on https://nlp.h-its.org

#35 jwijffels closed 4 years ago
4
version of sentencepiece used

#34 jwijffels closed 4 years ago
4
Is there a way to specify the maximum number of subwords so that I can get an embedding of fixed size?

#33 subrahmanyap closed 4 years ago
1
Difference between "en.wiki.bpe.vs50000" and "en.wiki.bpe.op50000"

#32 caozhen-alex closed 5 years ago
2
model/embedding versioning?

#31 aparrish closed 5 years ago
2
Why do Digits always mapped to zero?

#30 sumyatthitsar closed 5 years ago
2
Compare embeddings

#29 loretoparisi closed 5 years ago
2
tokenization only feature

#28 trideeprath closed 3 years ago
1
most_similar method

#27 trideeprath closed 5 years ago
1
The index for <unk> is 0, so what about <pad>?

#26 ghost closed 5 years ago
0
How do you get the embedding/id for the pad token ?

#25 derlin closed 5 years ago
3
Syntax error while importing

#24 amansrivastava17 closed 5 years ago
1
load vectors from path

#23 alejandrojcastaneira closed 5 years ago
1
Training customized bpemb

#22 gccome closed 5 years ago
1