arthurdejong / python-stdnum

A Python library to provide functions to handle, parse and validate standard numbers.
https://arthurdejong.org/python-stdnum/
GNU Lesser General Public License v2.1
502 stars 211 forks source link

DO ncf search_dgii returning SSLError #452

Closed kvillar93 closed 1 month ago

kvillar93 commented 2 months ago

When executing the method same as always, the service returns SSLError. Please see next example:

from stdnum.do import rnc

result = rnc.search_dgii('132262875', end_at=20, start_at=1, timeout=4)

---------------------------------------------------------------------------
SSLCertVerificationError                  Traceback (most recent call last)
File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connectionpool.py:466, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    465 try:
--> 466     self._validate_conn(conn)
    467 except (SocketTimeout, BaseSSLError) as e:

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connectionpool.py:1095, in HTTPSConnectionPool._validate_conn(self, conn)
   1094 if conn.is_closed:
-> 1095     conn.connect()
   1097 # TODO revise this, see https://github.com/urllib3/urllib3/issues/2791

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connection.py:652, in HTTPSConnection.connect(self)
    650 server_hostname_rm_dot = server_hostname.rstrip(".")
--> 652 sock_and_verified = _ssl_wrap_socket_and_match_hostname(
    653     sock=sock,
    654     cert_reqs=self.cert_reqs,
    655     ssl_version=self.ssl_version,
    656     ssl_minimum_version=self.ssl_minimum_version,
    657     ssl_maximum_version=self.ssl_maximum_version,
    658     ca_certs=self.ca_certs,
    659     ca_cert_dir=self.ca_cert_dir,
    660     ca_cert_data=self.ca_cert_data,
    661     cert_file=self.cert_file,
    662     key_file=self.key_file,
    663     key_password=self.key_password,
    664     server_hostname=server_hostname_rm_dot,
    665     ssl_context=self.ssl_context,
    666     tls_in_tls=tls_in_tls,
    667     assert_hostname=self.assert_hostname,
    668     assert_fingerprint=self.assert_fingerprint,
    669 )
    670 self.sock = sock_and_verified.socket

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connection.py:805, in _ssl_wrap_socket_and_match_hostname(sock, cert_reqs, ssl_version, ssl_minimum_version, ssl_maximum_version, cert_file, key_file, key_password, ca_certs, ca_cert_dir, ca_cert_data, assert_hostname, assert_fingerprint, server_hostname, ssl_context, tls_in_tls)
    803         server_hostname = normalized
--> 805 ssl_sock = ssl_wrap_socket(
    806     sock=sock,
    807     keyfile=key_file,
    808     certfile=cert_file,
    809     key_password=key_password,
    810     ca_certs=ca_certs,
    811     ca_cert_dir=ca_cert_dir,
    812     ca_cert_data=ca_cert_data,
    813     server_hostname=server_hostname,
    814     ssl_context=context,
    815     tls_in_tls=tls_in_tls,
    816 )
    818 try:

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\util\ssl_.py:465, in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
    463     pass
--> 465 ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
    466 return ssl_sock

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\util\ssl_.py:509, in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
    507     return SSLTransport(sock, ssl_context, server_hostname)
--> 509 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)

File C:\ProgramData\anaconda3\Lib\ssl.py:455, in SSLContext.wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    449 def wrap_socket(self, sock, server_side=False,
    450                 do_handshake_on_connect=True,
    451                 suppress_ragged_eofs=True,
    452                 server_hostname=None, session=None):
    453     # SSLSocket class handles server_hostname encoding before it calls
    454     # ctx._wrap_socket()
--> 455     return self.sslsocket_class._create(
    456         sock=sock,
    457         server_side=server_side,
    458         do_handshake_on_connect=do_handshake_on_connect,
    459         suppress_ragged_eofs=suppress_ragged_eofs,
    460         server_hostname=server_hostname,
    461         context=self,
    462         session=session
    463     )

File C:\ProgramData\anaconda3\Lib\ssl.py:1042, in SSLSocket._create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1041                 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1042             self.do_handshake()
   1043 except:

File C:\ProgramData\anaconda3\Lib\ssl.py:1320, in SSLSocket.do_handshake(self, block)
   1319         self.settimeout(None)
-> 1320     self._sslobj.do_handshake()
   1321 finally:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)

During handling of the above exception, another exception occurred:

SSLError                                  Traceback (most recent call last)
File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connectionpool.py:789, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    788 # Make the request on the HTTPConnection object
--> 789 response = self._make_request(
    790     conn,
    791     method,
    792     url,
    793     timeout=timeout_obj,
    794     body=body,
    795     headers=headers,
    796     chunked=chunked,
    797     retries=retries,
    798     response_conn=response_conn,
    799     preload_content=preload_content,
    800     decode_content=decode_content,
    801     **response_kw,
    802 )
    804 # Everything went great!

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connectionpool.py:490, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    489         new_e = _wrap_proxy_error(new_e, conn.proxy.scheme)
--> 490     raise new_e
    492 # conn.request() calls http.client.*.request, not the method in
    493 # urllib3.request. It also calls makefile (recv) on the socket.

SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)

The above exception was the direct cause of the following exception:

MaxRetryError                             Traceback (most recent call last)
File C:\ProgramData\anaconda3\Lib\site-packages\requests\adapters.py:589, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    588 try:
--> 589     resp = conn.urlopen(
    590         method=request.method,
    591         url=url,
    592         body=request.body,
    593         headers=request.headers,
    594         redirect=False,
    595         assert_same_host=False,
    596         preload_content=False,
    597         decode_content=False,
    598         retries=self.max_retries,
    599         timeout=timeout,
    600         chunked=chunked,
    601     )
    603 except (ProtocolError, OSError) as err:

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\connectionpool.py:843, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    841     new_e = ProtocolError("Connection aborted.", new_e)
--> 843 retries = retries.increment(
    844     method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    845 )
    846 retries.sleep()

File C:\ProgramData\anaconda3\Lib\site-packages\urllib3\util\retry.py:519, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    518     reason = error or ResponseError(cause)
--> 519     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    521 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='[www.dgii.gov.do](https://www.dgii.gov.do/)', port=443): Max retries exceeded with url: /wsMovilDGII/WSMovilDGII.asmx?WSDL (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))

During handling of the above exception, another exception occurred:

SSLError                                  Traceback (most recent call last)
Cell In[77], line 3
      1 from stdnum.do import rnc, cedula
----> 3 result = rnc.search_dgii('132262875', end_at=20, start_at=1, timeout=4)

File C:\ProgramData\anaconda3\Lib\site-packages\stdnum\do\rnc.py:183, in search_dgii(keyword, end_at, start_at, timeout)
    156 """Search the DGII online web service using the keyword.
    157 
    158 This uses the validation service run by the the Dirección General de
   (...)
    179 
    180 Will return an empty list if the number is invalid or unknown."""
    181 # this function isn't automatically tested because it would require
    182 # network access for the tests and unnecessarily load the online service
--> 183 client = get_soap_client(dgii_wsdl, timeout)
    184 results = client.GetContribuyentes(
    185     value=keyword,
    186     patronBusqueda=1,       # search type: 0=by number, 1=by name
    187     inicioFilas=start_at,   # start result (1-based)
    188     filaFilas=end_at,       # end result
    189     IMEI='')
    190 if results and 'GetContribuyentesResult' in results:

File C:\ProgramData\anaconda3\Lib\site-packages\stdnum\util.py:258, in get_soap_client(wsdlurl, timeout)
    256     transport = Transport(timeout=timeout)
    257     from zeep import CachingClient
--> 258     client = CachingClient(wsdlurl, transport=transport).service
    259 except ImportError:
    260     # fall back to non-caching zeep client
    261     try:

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\client.py:269, in CachingClient.__init__(self, *args, **kwargs)
    265 from zeep.cache import SqliteCache
    267 kwargs["transport"] = kwargs.get("transport") or Transport(cache=SqliteCache())
--> 269 super().__init__(*args, **kwargs)

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\client.py:76, in Client.__init__(self, wsdl, wsse, transport, service_name, port_name, plugins, settings)
     74     self.wsdl = wsdl
     75 else:
---> 76     self.wsdl = Document(wsdl, self.transport, settings=self.settings)
     77 self.wsse = wsse
     78 self.plugins = plugins if plugins is not None else []

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\wsdl\wsdl.py:92, in Document.__init__(self, location, transport, base, settings)
     83 self._definitions = (
     84     {}
     85 )  # type: typing.Dict[typing.Tuple[str, str], "Definition"]
     86 self.types = Schema(
     87     node=None,
     88     transport=self.transport,
     89     location=self.location,
     90     settings=self.settings,
     91 )
---> 92 self.load(location)

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\wsdl\wsdl.py:95, in Document.load(self, location)
     94 def load(self, location):
---> 95     document = self._get_xml_document(location)
     97     root_definitions = Definition(self, document, self.location)
     98     root_definitions.resolve_imports()

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\wsdl\wsdl.py:155, in Document._get_xml_document(self, location)
    147 def _get_xml_document(self, location: typing.IO) -> etree._Element:
    148     """Load the XML content from the given location and return an
    149     lxml.Element object.
    150 
   (...)
    153 
    154     """
--> 155     return load_external(
    156         location, self.transport, self.location, settings=self.settings
    157     )

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\loader.py:89, in load_external(url, transport, base_url, settings)
     87     if base_url:
     88         url = absolute_location(url, base_url)
---> 89     content = transport.load(url)
     90 return parse_xml(content, transport, base_url, settings=settings)

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\transports.py:123, in Transport.load(self, url)
    120     if response:
    121         return bytes(response)
--> 123 content = self._load_remote_data(url)
    125 if self.cache:
    126     self.cache.add(url, content)

File C:\ProgramData\anaconda3\Lib\site-packages\zeep\transports.py:135, in Transport._load_remote_data(self, url)
    133 def _load_remote_data(self, url):
    134     self.logger.debug("Loading remote data from: %s", url)
--> 135     response = self.session.get(url, timeout=self.load_timeout)
    136     response.raise_for_status()
    137     return response.content

File C:\ProgramData\anaconda3\Lib\site-packages\requests\sessions.py:602, in Session.get(self, url, **kwargs)
    594 r"""Sends a GET request. Returns :class:`Response` object.
    595 
    596 :param url: URL for the new :class:`Request` object.
    597 :param \*\*kwargs: Optional arguments that ``request`` takes.
    598 :rtype: requests.Response
    599 """
    601 kwargs.setdefault("allow_redirects", True)
--> 602 return self.request("GET", url, **kwargs)

File C:\ProgramData\anaconda3\Lib\site-packages\requests\sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File C:\ProgramData\anaconda3\Lib\site-packages\requests\sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File C:\ProgramData\anaconda3\Lib\site-packages\requests\adapters.py:620, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    616         raise ProxyError(e, request=request)
    618     if isinstance(e.reason, _SSLError):
    619         # This branch is for urllib3 v1.22 and later.
--> 620         raise SSLError(e, request=request)
    622     raise ConnectionError(e, request=request)
    624 except ClosedPoolError as e:

SSLError: HTTPSConnectionPool(host='[www.dgii.gov.do](https://www.dgii.gov.do/)', port=443): Max retries exceeded with url: /wsMovilDGII/WSMovilDGII.asmx?WSDL (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))

Any idea why this happens?

arthurdejong commented 1 month ago

I find that SSL labs has pretty useful tests and explanation of what goes wrong with a certificate: https://www.ssllabs.com/ssltest/analyze.html?d=www.dgii.gov.do

It appears that the intermediate certificate (DigiCert EV RSA CA G2) is not sent by the server which is a mis-configuration on the server side. That the site works in most browsers is that browsers can download the certificate via AIA chasing or cache seen intermediate certificates. This is not supported in Python, see https://github.com/python/cpython/issues/62817

The work-around is to download the intermediate certificate and add it to the trust store you're using. I'll see if I can add the intermediate certificate to python-stdnum but it is quite tricky because it goes via a number of abstraction layers and possibly fragile if the certificate changes.

arthurdejong commented 1 month ago

Unless I'm mistaken there is no easy way to specify that requests can trust the system or built-in certificate store as well as a certain extra certificate.

kvillar93 commented 1 month ago

Thank you for your thougthful response. I assumed there wasn't an easy way, everything related to the DGII is always problematic.

In the pull request #453, jeffryjdelarosa suggested disabling the SSL verification to bypass this problem. Does it work or it is recommendend?

kvillar93 commented 1 month ago

Also, in the same /cpython/issues/62817 there is a linked workaround to do AIA chasing in python at the end: danilobellini/aia.

Do you think this library can work with this issue?

jeffryjdelarosa commented 1 month ago

In my experience, DGII often faces challenges with certificates, as they typically acquire them through third parties via the government's 'Purchasing and Contracting' department. As a result, they are sometimes unaware when a certificate has expired or has technical issues. This is why I chose to disable SSL verification. It's unfortunate that in our country, we constantly have to deal with these types of problems, and every four years, the person in charge of each government department changes, affecting continuity and efficiency.

kvillar93 commented 1 month ago

@jeffryjdelarosa you are so damn right.

arthurdejong commented 1 month ago

It seems the problem with the missing intermediate certificate has been fixed by the operators of www.dgii.gov.do.

In 3fcebb2 I've added a verify argument to all functions that deal with network services to allow working around this from caller applications without having to update python-stdnum itself.

For example until the requests library supports AIA chasing this can probably be accomplished by (untested because the proper intermediate certificate is present now):

from aia import AIASession
from stdnum.do import rnc
from tempfile import NamedTemporaryFile

aia_session = AIASession()
cadata = aia_session.cadata_from_url(rnc.dgii_wsdl)
with NamedTemporaryFile("w") as pem_file:
    pem_file.write(cadata)
    pem_file.flush()
    result = rnc.search_dgii('132262875', end_at=20, start_at=1, timeout=4, verify=pem_file.name)