freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

Louisiana Appeals Scraper #916

Closed sentry-io[bot] closed 1 month ago

sentry-io[bot] commented 10 months ago

ConnectTimeout: HTTPSConnectionPool(host='www.la-fcca.org', port=443): Max retries exceeded with url: /opiniongri...

Sentry Issue: COURTLISTENER-64R

TimeoutError: timed out
  File "urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
  File "urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)

ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f6e498725d0>, 'Connection to www.la-fcca.org timed out. (connect timeout=60)')
(1 additional frame(s) were not displayed)
...
  File "urllib3/connectionpool.py", line 492, in _make_request
    raise new_e
  File "urllib3/connectionpool.py", line 468, in _make_request
    self._validate_conn(conn)
  File "urllib3/connectionpool.py", line 1097, in _validate_conn
    conn.connect()
  File "urllib3/connection.py", line 611, in connect
    self.sock = sock = self._new_conn()
  File "urllib3/connection.py", line 212, in _new_conn
    raise ConnectTimeoutError(

MaxRetryError: HTTPSConnectionPool(host='www.la-fcca.org', port=443): Max retries exceeded with url: /opiniongrid/opinionpub.php?opinionpage_size=50 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f6e498725d0>, 'Connection to www.la-fcca.org timed out. (connect timeout=60)'))
  File "requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]

ConnectTimeout: HTTPSConnectionPool(host='www.la-fcca.org', port=443): Max retries exceeded with url: /opiniongrid/opinionpub.php?opinionpage_size=50 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f6e498725d0>, 'Connection to www.la-fcca.org timed out. (connect timeout=60)'))
(4 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 385, in handle
    self.parse_and_scrape_site(mod, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 348, in parse_and_scrape_site
    site = mod.Site().parse()
flooie commented 10 months ago

This was one of our TLS/SSl adapter courts - which seems to be better but not fixed. Here is the graph of errors - you can see when we inpmlemented a fix.

Screenshot 2024-02-02 at 5 04 58 PM
flooie commented 9 months ago

@grossir we think we can close this - and maybe put an archive until this occur at a higher threshold right? we are getting the data - it just seems to be a flaky site no?

grossir commented 9 months ago

Yes let's archive it. we have current data on CL

I wasn't able to reproduce the error, seems like random connection errors from the server

flooie commented 9 months ago

I want to close this - but we only return unknown status ... is that correct @grossir