bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

util: make content header check more robust #67

Closed stefan-it closed 6 months ago

stefan-it commented 6 months ago

Hi,

as investigated in ##66 the Content-Type header changed from application/x-gzip to application/x-gzip for the provided models, such as https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs10000.d100.w2v.bin.tar.gz.

This MR robustify the download logic a bit and generally checks for gzip strin in the Content-Type header .