J535D165 / cbsodata

Unofficial Statistics Netherlands (CBS) open data API client for Python
http://cbsodata.readthedocs.io/
MIT License
42 stars 17 forks source link

SSL verification fails even after passing custom certificates #12

Closed joosbuijsNL closed 4 years ago

joosbuijsNL commented 4 years ago

This code

import pandas as pd import cbsodata toc = pd.DataFrame(cbsodata.get_table_list())

Returns SSLError: HTTPSConnectionPool(host='opendata.cbs.nl', port=443): Max retries exceeded with url: /ODataCatalog/Tables?$format=json (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

This is a common error for us because our company (APG in Heerlen) intercepts SSL certificates, and we need to provide custom certificates to the Python requests.

The following bit works for 'bare' requests: import os os.environ['REQUESTS_CA_BUNDLE'] = 'C:/dev/ca-bundle.crt'

Alternatively, this also works in general: s=requests.Session() s.verify = 'C:/dev/ca-bundle.crt'

However, both fixes still don't fix the original issue with CBSOdata.

I inspected the CBSOdata Python code and could not find anything strange, why for instance os.environ['REQUESTS_CA_BUNDLE'] would be ignored etc., but still believe that adding an option in the options object to set verify to a directory with custom certificates, would be the solution. Any other ideas/hints/suggestions are welcome of course!

J535D165 commented 4 years ago

Is the certificate self-signed? https://2.python-requests.org//en/latest/user/advanced/#prepared-requests

When you are using the prepared request flow, keep in mind that it does not take into account the environment. This can cause problems if you are using environment variables to change the behaviour of requests. For example: Self-signed SSL certificates specified in REQUESTS_CA_BUNDLE will not be taken into account. As a result an SSL: CERTIFICATE_VERIFY_FAILED is thrown. You can get around this behaviour by explicitly merging the environment settings into your session:

Can we narrow it down to the requests library? Does the following fail (please adjust to your proxy)?

requests.get("https://opendata.cbs.nl/ODataApi/OData/37506wwm/UntypedDataSet?$format=json")
joosbuijsNL commented 4 years ago

Hi @J535D165 , thanks for the quick reply!

import requests requests.get("https://opendata.cbs.nl/ODataApi/OData/37506wwm/UntypedDataSet?$format=json")

Results in the SSLError mentioned above.

import requests import os os.environ['REQUESTS_CA_BUNDLE'] = './ca-bundle.crt' requests.get("https://opendata.cbs.nl/ODataApi/OData/37506wwm/UntypedDataSet?$format=json")

Returns <Response [200]> and thus works.

Alternatively, the code below (after unsetting the OS REQUESTS_CA_BUNDLE of course) also works: requests.get("https://opendata.cbs.nl/ODataApi/OData/37506wwm/UntypedDataSet?$format=json",verify='./ca-bundle.crt')

J535D165 commented 4 years ago

Hmm, then the following should work?

import os
os.environ['REQUESTS_CA_BUNDLE'] = './ca-bundle.crt'

import pandas as pd
import cbsodata

toc = pd.DataFrame(cbsodata.get_table_list())

Please let me know

joosbuijsNL commented 4 years ago

I've tried this before, and tried again with your code copied 1:1, and still have the original SSL error.

I'm not sure if this is a bug in the requests library, and leave it to you whether you want to allow a workaround in your CBSOData Python package, but bottom line is that for us it's not usable right now (but at the same time there's no concrete use case for us either).

ties commented 4 years ago

I'm not sure if this is a bug in the requests library, and leave it to you whether you want to allow a workaround in your CBSOData Python package, but bottom line is that for us it's not usable right now (but at the same time there's no concrete use case for us either).

The workaround is quite specific. Perhaps another workaround would be the ability to pass in your own Session (and/or even make it compatible with aiohttp). That would allow a number of properties (e.g. the base URL, proxy settings, CA bundle) to be changed before you pass it in.

J535D165 commented 4 years ago

Thanks @ties and @joosbuijsNL. I propose a solution in #13. Can you review this?

@jolienoomens

joosbuijsNL commented 4 years ago

Thanks @ties and @joosbuijsNL. I propose a solution in #13. Can you review this?

@jolienoomens

Confirmed, #13 seems to work when passing in the verify folder. Nice, good to see that open source projects listen to their users :)

J535D165 commented 4 years ago

There is a new release on PyPI. (pip install --upgrade cbsodata)

joosbuijsNL commented 4 years ago

Can confirm that the following now works for us with the published version of cbsodata:

import pandas as pd import cbsodata cbsodata.options.requests['verify'] = 'C:\dev\ca-bundle.crt' toc = pd.DataFrame(cbsodata.get_table_list())

Thanks a lot!

J535D165 commented 4 years ago

Thanks for reporting and contributing to the solution.