ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
76 stars 152 forks source link

SSLError while pushing to Datastore #229

Open maxclac opened 3 years ago

maxclac commented 3 years ago

Hi!

I am using CKAN 2.9.2 and I am currently having a problem with an Excel file in a dataset that I try to manually push to the Datastore.

Traceback (most recent call last):
  File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 363, in push_to_datastore
    response = requests.get(
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='ckan.example.com', port=443): Max retries exceeded with url: /dataset/12baf1af-1b92-4b37-834c-660ee75e19e5/resource/9e577061-19a0-47e4-ba5b-e0a4b3da4063/download/file.xlsx (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

I put the SSL_VERIFY to False in the config file but it did not help.

What can I do?

maxclac commented 3 years ago

Any idea, anyone?

jqnatividad commented 3 years ago

Have you tried installing ndg-httpsclient?

pip install ndg-httpsclient

If it doesn't help, at least it could also help with debugging your SSL issues.

https://github.com/cedadev/ndg_httpsclient/#running-ndg_httpclient

maxclac commented 3 years ago

Thank you for the suggestion! I will try this and let you know.

maxclac commented 3 years ago

I am sorry but I would need more help here. How this tool works exactly?

jqnatividad commented 3 years ago

Also, just noticed host='ckan.example.com' in your trace log:

requests.exceptions.SSLError: HTTPSConnectionPool(host='ckan.example.com', port=443): Max retries exceeded with url: /dataset/12baf1af-1b92-4b37-834c-660ee75e19e5/resource/9e577061-19a0-47e4-ba5b-e0a4b3da4063/download/file.xlsx (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

Is your ckan.site_url set correctly?

maxclac commented 3 years ago

No, I just replaced my actual url with this, so that I don't reveal it.

jqnatividad commented 3 years ago

Got the suggestion from https://stackoverflow.com/questions/33410577/python-requests-exceptions-sslerror-eof-occurred-in-violation-of-protocol.

It should help with SSL certificate issues, but like you said, you had SSL_VERIFY set to False.

Anyway, its most likely an configuration issue on your end as Datapusher works with SSL and it will have to be a process of elimination.

First off, does the Datapusher work when you run CKAN without SSL?

G'luck!

maxclac commented 3 years ago

Thanks! I would rather not turn off SSL unless I really have to, as it is a productive running CKAN instance.

maxclac commented 3 years ago

I was thinking about using xloader instead, but it is not so straightforward.

markstuart commented 3 years ago

Hey @maxclac, we were having this same issue building datapusher in a Docker container. After a lot of trial and error, we tried changing the base image from ubuntu:20.04 to debian:buster and everything just started working.

I can only assume that there is some incompatibility in the ssl libraries provided by the operating system and the python libraries used by datapusher?

Anyway, might help you resolve the issue. Good luck!

maxclac commented 3 years ago

Interesting! Thank you, I will have a look.

markstuart commented 3 years ago

We've been through a few rounds of trying to get this working nicely... If you have the ability to use a pre-built docker image in your environment I heartily recommend https://hub.docker.com/r/keitaro/ckan-datapusher... worked nicely out of the box for me, can override some settings with environment vars etc

maxclac commented 3 years ago

I see. It means that I can disable my current Datapusher, and use a container from this Docker image instead? Do I understand that right?

maxclac commented 2 years ago

Hi! Just an update, I am still having this problem and would still appreciate help.

maxclac commented 2 years ago

It is working when I set SSL_VERIFY = False in /usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py. It would mean that the parameter ckan.datapusher.ssl_verify = False in my config file is ignored.