bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

SSLError #63

Closed davebulaval closed 2 years ago

davebulaval commented 2 years ago

I've been using BPEmb in one of my packages (Deepparse), and it just got to my attention that either your SSL certificate has expired or request implemented a breaking change, but now on a clean install/download of BPEmb model weights, I get an SSLError.

If no certificates are usually in place for the HTTPS link, here is a quick fix. Otherwise, if you could update the certificate as soon as possible?

To reproduce

BPEmb(lang="multi", vs=100000, dim=300)

Error stack

/home/david/anaconda3/envs/deepparse/bin/python /home/david/Github/deepparse/ttttttfee.py 
downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.model
Traceback (most recent call last):
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/ssl.py", line 1070, in _create
    self.do_handshake()
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bpemb.h-its.org', port=443): Max retries exceeded with url: /multi/multi.wiki.bpe.vs100000.model (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/david/Github/deepparse/ttttttfee.py", line 3, in <module>
    BPEmb(lang="multi", vs=100000, dim=300)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/bpemb/bpemb.py", line 173, in __init__
    self.model_file = self._load_file(model_file)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/bpemb/bpemb.py", line 228, in _load_file
    return http_get(file_url, cached_file, ignore_tardir=True)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/bpemb/util.py", line 40, in http_get
    headers = http_get_temp(url, temp_file)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/bpemb/util.py", line 16, in http_get_temp
    req = requests.get(url, stream=True)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/sessions.py", line 723, in send
    history = [resp for resp in gen]
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/sessions.py", line 723, in <listcomp>
    history = [resp for resp in gen]
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/sessions.py", line 266, in resolve_redirects
    resp = self.send(
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/david/anaconda3/envs/deepparse/lib/python3.10/site-packages/requests/adapters.py", line 563, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='bpemb.h-its.org', port=443): Max retries exceeded with url: /multi/multi.wiki.bpe.vs100000.model (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))
bheinzerling commented 2 years ago

Thanks for reporting this issue!

As a temporary fix, I've disabled SSL certificate verification (and the resulting warning, as done here: https://github.com/GRAAL-Research/deepparse/issues/156), so upgrading bpemb to the latest version should make everything work as before.

Looking at the certificate, it seems that it was renewed a couple of days ago on 2022-09-20 and is valid until 2023-10-10, so it shouldn't be expired. I'll try to find out what is going on.

bheinzerling commented 2 years ago

Looked into this a bit more. When opening https://nlp.h-its.org or downloading files via a web browser, none of the four browsers I tried had any issues and all browsers were able to verify the three certificates on the SSL certificate chain.

On the Python side, the requests module delegates SSL certification to the certifi package, which stores trusted certificates in the location given by

>>> import certifi
>>> certifi.where()
'.../site-packages/certifi/cacert.pem'

Assuming that certifi is not trusting the *.h-its.org certificate, I thought that adding it manually would solve the issue, but it didn't. Trying the other certificates on the chain, it turns out that certifi doesn't trust the second one, GeoTrust RSA CA 2018. After adding this certificate to the cacert.pem file, requests.get works as before, i.e., without having to do verify=False.

I have no knowledge of SSL certificates whatsoever, but it looks to me that this isn't an issue with the *.h-its.org certificate. I"m guessing that the secure way to resolve this issue would be to either ask certifi to add that certificate to its default .pem file, or for each user to manually add it to their cacert.pem locally. Neither of those options looks particularly compelling to me, so I'm inclined to leave SSL verification disabled until certifi trusts that certificate.