explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.7k stars 4.36k forks source link

SSL error when downloading models #4297

Closed yutingwum closed 5 years ago

yutingwum commented 5 years ago

Situation

I tired to download Spacy models on virtual environment with Python 3.6.5. After successfully download Spacy with the line pip install rasa[spacy], I tried to download the English model with python -m spacy download en_core_web_md. But an SSLErrorr was raised. I looked into other similar issues including #3066, #2248, #2212

I tried the following line python3 -m spacy download en_core_web_md-2.1.8 --direct but did not work and threw the same error as well.

Please see the below output.

File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 344, in _make_request
    self._validate_conn(conn)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn
    conn.connect()
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/connection.py", line 370, in connect
    ssl_context=context)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 355, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 814, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1068, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Users/yuwu/venv/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/spacy/__main__.py", line 35, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "/Users/yuwu/venv/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/spacy/cli/download.py", line 38, in download
    shortcuts = get_json(about.__shortcuts__, "available shortcuts")
  File "/Users/yuwu/venv/lib/python3.6/site-packages/spacy/cli/download.py", line 84, in get_json
    r = requests.get(url)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/Users/yuwu/venv/lib/python3.6/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)'),))

Your Environment

ines commented 5 years ago

Are you behind a proxy? The problem here is likely this one and happens in requests when spaCy is trying to fetch the shortcuts and compatibility table from GitHub over SSL. https://stackoverflow.com/questions/46604114/python-requests-ssl-error-certificate-verify-failed

I think the easiest workaround is to just download the .tar.gz manually via your browser and then pip install that file. See the model releases and the models directory (each model has a "release details" button).

Now that the download command is back to using requests, we could consider adding a --cert option that's passed to both requests (to download shortcuts and compatibility JSON files) and pip in the subprocess (to actually download and install the model). I'm just not sure how to best test this.

yutingwum commented 5 years ago

@ines Hi Ines, thank you for your reply. I am using a corporate laptop so I guess it is behind proxy. For downloading the package to a directory, if I am using a virtual environment, what directory should I download to? Currently the Spacy folder itself is in venv/lib/spacy

ines commented 5 years ago

@yutingwum spaCy models are regular Python packages and the .tar.gz archive you download is the installer. So you can download the file to any directory. In your virtual environment, you can then pip install the model from the local path:

pip install /local/path/to/en_core_web_sm-2.1.0.tar.gz

This will automatically install the model in your Python environment, like any other package. You'll them be able to load it via spacy.load("en_core_web_sm").

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.