freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
341 stars 98 forks source link

Docket ParserError: Unknown string format: DFTTERM #823

Open sentry-io[bot] opened 7 months ago

sentry-io[bot] commented 7 months ago

Sentry Issue: COURTLISTENER-5SY

_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/courtlistener/cl/recap/tasks.py", line 497, in parse_docket_text
    return report.data
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/juriscraper/pacer/docket_report.py", line 402, in data
    return super().data
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/juriscraper/pacer/docket_report.py", line 75, in data
    data["docket_entries"] = self.docket_entries
                             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/juriscraper/pacer/docket_report.py", line 1246, in docket_entries
    de["date_filed"] = convert_date_string(date_filed_str)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/juriscraper/lib...
ParserError: Unknown string format: DFTTERM
(9 additional frame(s) were not displayed)
...
  File "cl/recap/views.py", line 59, in perform_create
    await asyncio.shield(recap_upload_task)
  File "cl/recap/tasks.py", line 114, in process_recap_upload
    docket = await process_recap_docket(pq.pk)
  File "cl/recap/tasks.py", line 535, in process_recap_docket
    data = await asyncio.get_running_loop().run_in_executor(
albertisfu commented 2 months ago

Just a heads up that this issue has had 4.7K (so far) events in the last few hours today.

ERosendo commented 2 months ago

@albertisfu It seems like all the events are related to cases from the Colorado District Court(cod).

ERosendo commented 2 months ago

@albertisfu While checking the processing queue (just the first page for now), I noticed all the recent uploads seem to be from the Colorado District Court (cod) and there are a bunch of queues for the same criminal case 1:05-cr-00425-REB, USA v. Hall et al

mlissner commented 2 months ago

Who knows why this is spiking today, but I just blocked the IP address responsible.

ERosendo commented 2 months ago

@mlissner The number of events shows no signs of slowing down. I'm concerned that if this continues, the recap user account will hit the rate limit, causing the extension to malfunction entirely.

mlissner commented 2 months ago

A valid concern. I thought we already gave that user an insane access limit, but apparently not. In any case, I found and squashed another IP address and it's stopped for the last seven minutes. Seems like it might have done the trick...for now.

mlissner commented 2 months ago

Blocked tor access to the API. That did it. Hopefully if this person really needs tor and needs to find me, they will reach me on signal at mlissner.06.