freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
364 stars 109 forks source link

Attachment page: IndexError: list index out of range #748

Open sentry-io[bot] opened 1 year ago

sentry-io[bot] commented 1 year ago

Sentry Issue: COURTLISTENER-50B

IndexError: list index out of range
(12 additional frame(s) were not displayed)
...
  File "concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "cl/recap/views.py", line 56, in perform_create
    await process_recap_upload(pq)
  File "cl/recap/tasks.py", line 113, in process_recap_upload
    await process_recap_attachment(pq.pk)
  File "cl/recap/tasks.py", line 608, in process_recap_attachment
    att_data = get_data_from_att_report(text, pq.court_id)
  File "cl/recap/mergers.py", line 1326, in get_data_from_att_report
    att_data = att_page.data
ttys0dev commented 1 year ago

Any idea what attachment page this is?

mlissner commented 1 year ago

I can share it's this one:

{
    "id": 10947097,
    "court": "txnd",
    "docket": null,
    "docket_entry": null,
    "recap_document": null,
    "date_created": "2023-10-10T03:07:24.365750-07:00",
    "date_modified": "2023-10-10T03:07:24.645863-07:00",
    "pacer_case_id": "348334",
    "pacer_doc_id": "",
    "document_number": null,
    "attachment_number": null,
    "status": 4,
    "upload_type": 2,
    "error_message": "",
    "debug": false
}

But that doesn't tell you which docket entry it is. @albertisfu could help with that too?

albertisfu commented 1 year ago

Here is the attachment page related to this upload:

a8ccc7c33ff54ab0a55b39b5b6c83a5b.txt

ttys0dev commented 1 year ago

Here is the attachment page related to this upload:

The attachment page html seems to have been corrupted by this firefox add-on.

mlissner commented 1 year ago

Shoot, that's annoying, but makes for an easy fix. We have a list of extension strings we detect. Maybe we just add this one too:

https://github.com/freelawproject/juriscraper/blob/2d864fc5ff64402dffb02dac27588023aa7b1cbc/juriscraper/pacer/reports.py#L49-L53

?