freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
340 stars 98 forks source link

ValueError: Parsing the file size of a docket with attachments. #1039

Open sentry-io[bot] opened 1 month ago

sentry-io[bot] commented 1 month ago

This issue had 8.6K events in a short period of time, but upon reviewing the events, it seems that most of them are related to a couple of cases from COD:

The problem occurs during the parsing of the document file size.

A example PQ

Sentry Issue: COURTLISTENER-5PS

_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/courtlistener/cl/recap/tasks.py", line 521, in parse_docket_text
    return report.data
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/docket_report.py", line 402, in data
    return super().data
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/docket_report.py", line 75, in data
    data["docket_entries"] = self.docket_entries
                             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/docket_report.py", line 1236, in docket_entries
    attachments = self._get_attachments(cells[2])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/docket_report.p...
ValueError: invalid literal for int() with base 10: ''
(9 additional frame(s) were not displayed)
...
  File "cl/recap/views.py", line 65, in perform_create
    await asyncio.shield(recap_upload_task)
  File "cl/recap/tasks.py", line 114, in process_recap_upload
    docket = await process_recap_docket(pq.pk)
  File "cl/recap/tasks.py", line 559, in process_recap_docket
    data = await asyncio.get_running_loop().run_in_executor(