DOI-USGS / dataretrieval-python

Python package for retrieving water data from USGS or the multi-agency Water Quality Portal
https://doi-usgs.github.io/dataretrieval-python/
Other
165 stars 41 forks source link

SSLCertVerificationError #100

Closed mishranurag closed 1 year ago

mishranurag commented 1 year ago

I tried to download data for a couple of locations and saw the SSLCertVerificationError. I am not sure how to get around this issue.

Sincerely ~A

Edit: Just checked and this happens when I am using a VPN only.

thodson-usgs commented 1 year ago

Can't help you with that one. Talk to your IT about getting the correct SSL cert.

thodson-usgs commented 1 year ago

Note there are pros and cons. Once you have the cert, you'll get the error whenever you go off VPN, I believe.

mishranurag commented 1 year ago

Thank you for the quick response.

SorooshMani-NOAA commented 1 year ago

@thodson-usgs recently I started getting SSL certification errors (during automated tests on GitHub for searvey) from dataretireval. It used to work fine before For examples if I execute:

from dataretrieval import nwis
nwis.get_pmcodes('elevation')

I get

/lib/python3.11/site-packages/dataretrieval/nadp.py:44: UserWarning: GDAL not installed. Some functions will not work.
  warnings.warn('GDAL not installed. Some functions will not work.')
Traceback (most recent call last):
  File "/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1092, in _validate_conn
    conn.connect()
  File "/lib/python3.11/site-packages/urllib3/connection.py", line 635, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/connection.py", line 776, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
               ^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 466, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 510, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/ssl.py", line 1075, in _create
    self.do_handshake()
  File "/lib/python3.11/ssl.py", line 1346, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='help.waterdata.usgs.gov', port=443): Max retries exceeded with url: /code/parameter_cd_nm_query?fmt=rdb&parm_nm_cd=%25elevation%25 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/dtrt.py", line 2, in <module>
    nwis.get_pmcodes('elevation')
  File "/lib/python3.11/site-packages/dataretrieval/nwis.py", line 763, in get_pmcodes
    response = query(url, payload)
               ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/dataretrieval/utils.py", line 196, in query
    response = requests.get(url, params=payload, headers=user_agent)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='help.waterdata.usgs.gov', port=443): Max retries exceeded with url: /code/parameter_cd_nm_query?fmt=rdb&parm_nm_cd=%25elevation%25 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')))

This is in spite of not changing the environment I was using. I even tried creating a fresh environment for dataretrieval in conda:

conda create -n usgs_test -cconda-forge dataretrieval

Is it something related to the USGS server?

thodson-usgs commented 1 year ago

Are you on the NOAA network? Have you tried it off network?

My first thought is that NOAA has changed its SSL cert and you need to update it in order to work on the network? In my experience, it may be easier to work off network.

A simple test would be to !pip install dataretrieval in a Google CoLab environment.

thodson-usgs commented 1 year ago

Tested, and I also get an SSL error on colab. @elbeejay, would you look it this?

SorooshMani-NOAA commented 1 year ago

I'm testing outside NOAA's network. The issue shows up on GitHub actions as well. See errors in https://github.com/oceanmodeling/searvey/actions/runs/5466143325/jobs/10038714613?pr=92

I'll try CoLab as well

Update Ok, so if it happens on CoLab for you, I won't test!

elbeejay commented 1 year ago

I'll take a look too

elbeejay commented 1 year ago

@thodson-usgs and @SorooshMani-NOAA there's something going on with the SSL certificate for sure.

The webservice itself is functional as links like this one do work: https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?parm_nm_cd=00060

As you both saw, the fetching of data via Python's requests library is returning that SSLError now. I don't know the true reason for this, there appears to be something unusual about the SSL certificate (SSL Checker suggests it isn't "Vendor signed"). I will reach out to our web team about this to see if anyone can provide any insight on what might be going on there.

For the Python dataretrieval package, we could add an optional parameter to allow users to perform "unverified" GET requests (https://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification) which might be okay for this use-case as there is no confidential data being transmitted from USGS to the user or from the user to the USGS. Not best practice, might could be a nice optional function to have for scenarios like this where something is going on but users need to access the data - let me know what you think @thodson-usgs.

thodson-usgs commented 1 year ago

SSL is a perennial issue, so that's probably a good idea whatever the outcome.

elbeejay commented 1 year ago

SSL is a perennial issue, so that's probably a good idea whatever the outcome.

Ok, will plan to add that functionality.

@SorooshMani-NOAA and @thodson-usgs I've been informed that the SSL certificate issue has been fixed - locally I was able to confirm that I no longer get this error.

SorooshMani-NOAA commented 1 year ago

@elbeejay, the issues seems to be resolved from GitHub tests side as well. Thanks for following up!