bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

EOFError: Compressed file ended before the end-of-stream marker was reached #60

Closed JulesBelveze closed 3 years ago

JulesBelveze commented 3 years ago

Hi there, First of all thanks for the cool repo 😃

I am trying to download the multilingual model using the following command:

multibpemb = BPEmb(lang="multi", vs=1000000, dim=300)

However, the download of multi.wiki.bpe.vs1000000.d300.w2v.bin.tar.gz always stops at 96% with the following error:

EOFError: Compressed file ended before the end-of-stream marker was reached

I've done a bit of research and I tried to delete my cache but it doesn't seem to fix it, any idea how I can load the multi-lingual model?

Cheers!

bheinzerling commented 3 years ago

I just downloaded the file from here: https://bpemb.h-its.org/multi/

Around 96% I got a network error, which is probably the same error you get in Python. I noticed that the download was quite slow, so maybe we're running into a request timeout setting or some other server limit. After resuming the download in the browser, the download completed and the extracted file looks fine.

So as a workaround you can manually download and extract the file into the cache folder (default is ~/.cache/bpemb/multi).

Unfortunately I cannot fix the underlying issue since I do not have control over the server on which the files are hosted.

JulesBelveze commented 3 years ago

Hey @bheinzerling thanks for the quick reply! Yes that's actually the workaround I used 😺

Thanks !