alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

ssl.SSLCertVerificationError: #70

Closed rosarion closed 2 years ago

rosarion commented 2 years ago

I followed all instruction and run the sample program using the AutoScraper as shown below from autoscraper import AutoScraper

url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

We can add one or multiple candidates here.

You can also put urls here to retrieve urls.

wanted_list = ["What are metaclasses in Python?"]

scraper = AutoScraper() result = scraper.build(url, wanted_list ) print(result)

But I get the follwoing error ============ RESTART: D:/PythonCode-1/Web Scraping/AutoSraper 001.py =========== Traceback (most recent call last): File "C:\Python39\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "C:\Python39\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "C:\Python39\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn conn.connect() File "C:\Python39\lib\site-packages\urllib3\connection.py", line 416, in connect self.sock = ssl_wrapsocket( File "C:\Python39\lib\site-packages\urllib3\util\ssl.py", line 449, in ssl_wrap_socket ssl_sock = _ssl_wrap_socketimpl( File "C:\Python39\lib\site-packages\urllib3\util\ssl.py", line 493, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Python39\lib\ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "C:\Python39\lib\ssl.py", line 1040, in _create self.do_handshake() File "C:\Python39\lib\ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python39\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen( File "C:\Python39\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen retries = retries.increment( File "C:\Python39\lib\site-packages\urllib3\util\retry.py", line 574, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='stackoverflow.com', port=443): Max retries exceeded with url: /questions/2081586/web-scraping-with-python (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:/PythonCode-1/Web Scraping/AutoSraper 001.py", line 11, in result = scraper.build(url, wanted_list ) File "C:\Python39\lib\site-packages\autoscraper\auto_scraper.py", line 227, in build soup = self._get_soup(url=url, html=html, request_args=request_args) File "C:\Python39\lib\site-packages\autoscraper\auto_scraper.py", line 119, in _get_soup html = cls._fetch_html(url, request_args) File "C:\Python39\lib\site-packages\autoscraper\auto_scraper.py", line 105, in _fetch_html res = requests.get(url, headers=headers, request_args) File "C:\Python39\lib\site-packages\requests\api.py", line 75, in get return request('get', url, params=params, kwargs) File "C:\Python39\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Python39\lib\site-packages\requests\sessions.py", line 542, in request resp = self.send(prep, send_kwargs) File "C:\Python39\lib\site-packages\requests\sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "C:\Python39\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='stackoverflow.com', port=443): Max retries exceeded with url: /questions/2081586/web-scraping-with-python (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))

bcarroll commented 2 years ago

The request is failing because the server certificate is expired. If you want to ignore this (and other SSL related errors), you can add request_args={"verify":False}.

Example: result = etsy_scraper.get_result_similar(url, group_by_alias=True, request_args={"verify":False})