datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
149 stars 52 forks source link

icann scraping gets SSL certificate error #539

Open sbenthall opened 2 years ago

sbenthall commented 2 years ago
$ python bin/collect_mail.py -u https://mm.icann.org/pipermail/cc-humanrights/
Traceback (most recent call last):
  File "/usr/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.9/http/client.py", line 1253, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1299, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1248, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1008, in _send_output
    self.send(msg)
  File "/usr/lib/python3.9/http/client.py", line 948, in send
    self.connect()
  File "/usr/lib/python3.9/http/client.py", line 1422, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)

Problematic because this data is used in this notebook:

https://github.com/datactive/bigbang/blob/main/examples/experimental_notebooks/Corr%20between%20centrality%20and%20community%200.1.ipynb

nllz commented 2 years ago

This has to do with SSL verification for which a local installed SSL cert is needed. There are several workarounds, We either need to install local certificates, which should be something like this:

import SSL ssl._create_default_https_context = ssl._create_stdlib_context

or disable SSL verification using this code (which is probably the wrong this to do):

import ssl ssl._create_default_https_context = ssl._create_unverified_context

Christovis commented 2 years ago

Such info would be good to document here I think.

nllz commented 2 years ago

But the problem is with the code and the environment, not the data sources, right?

nllz commented 2 years ago

Hmmm, problem does not exist if you remove the 's' from https... So this maybe have something to do with ICANN's mailman install config.