freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
373 stars 110 forks source link

Fix opinion scraper for the Texas Attorney General #427

Closed sentry-io[bot] closed 12 months ago

sentry-io[bot] commented 2 years ago

The opinion scraper for the Texas Attorney General changed its page design and data structure. https://texasattorneygeneral.gov/opinion/index-to-opinions

juriscraper.opinions.united_states.state.texag

Sentry Issue: COURTLISTENER-1YW

gaierror: [Errno -2] Name or service not known
  File "urllib3/connection.py", line 169, in _new_conn
    conn = connection.create_connection(
  File "urllib3/util/connection.py", line 73, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7ff800730a30>: Failed to establish a new connection: [Errno -2] Name or service not known
  File "urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "urllib3/connection.py", line 353, in connect
    conn = self._new_conn()
  File "urllib3/connection.py", line 181, in _new_conn
    raise NewConnectionError(

MaxRetryError: HTTPSConnectionPool(host='texasattorneygeneral.gov', port=443): Max retries exceeded with url: /opinion/index-to-opinions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff800730a30>: Failed to establish a new connection: [Errno -2] Name or service not known'))
  File "requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))

ConnectionError: HTTPSConnectionPool(host='texasattorneygeneral.gov', port=443): Max retries exceeded with url: /opinion/index-to-opinions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff800730a30>: Failed to establish a new connection: [Errno -2] Name or service not known'))
(5 additional frame(s) were not displayed)
...
  File "juriscraper/AbstractSite.py", line 351, in _request_url_get
    self.request["response"] = self.request["session"].get(
  File "requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
flooie commented 12 months ago

This was fixed previously