Open sentry-io[bot] opened 1 year ago
The URL directs to a The issue you were looking for was not found.
in Sentry
The email that causes this error is pvfopi0rd9vie91iu3fq6k6d5v77nq1j61mvtg01
for court azd
. It seems to be a single event issue, I have found no other instance in Sentry.
I have pasted the full traceback at the end.
The problem is that the email has weird formatting, that causes the juriscraper/pacer/email/_get_case_name_plain
function to pick an empty string as case name, even when the case name exists.
So, when trying to split the subject by case name subject.split(case_name)
to get the short description, str.split('')
causes a ValueError
juriscraper parses this as text/plain
email; I forced it to parse it as text/html
and it doesn't error, but still fails to pick up the case name
The email looks like this
but is formatted internally like this grep -A4 -B2 Name: pvfopi0rd9vie91iu3fq6k6d5v77nq1j61mvtg01.eml
The following transaction was entered on 6/30/2023 at 4:15 PM MST and filed=
on 6/30/2023
Case Name:
Crews v. DeSantis
Case Number:
2:23-cv-00969-MTL<https://ecf.azd.uscourts.gov/cgi-bin/DktRpt.pl?1336577>
Filer:
--
<td style=3D"padding:.75pt .75pt .75pt .75pt">
<p class=3D"MsoNormal"><strong><span style=3D"font-family:"Calibri&quo=
t;,sans-serif">Case Name:</span></strong>
<o:p></o:p></p>
</td>
<td style=3D"padding:.75pt .75pt .75pt .75pt">
<p class=3D"MsoNormal">Crews v. DeSantis<o:p></o:p></p>
The newline between "Case Name:" and the value makes the regex = r"Case Name:(.*)"
pick up an empty string.
This seems like a bug in itself, it should be regex = r"Case Name:(.+)"
. But that would not this specific ValueError
Full traceback:
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 report.data
File ~/venvs/courtlistener/lib/python3.12/site-packages/juriscraper/pacer/email.py:76, in NotificationEmail.data(self)
69 base = {
70 "court_id": self.court_id,
71 }
72 if self.content_type == "text/plain":
73 parsed = {
74 "appellate": self._is_appellate(),
75 "contains_attachments": self._contains_attachments_plain(),
---> 76 "dockets": self._get_dockets(),
77 "email_recipients": self._get_email_recipients_plain(),
78 }
79 else:
80 parsed = {
81 "appellate": self._is_appellate(),
82 "contains_attachments": self._contains_attachments(),
83 "dockets": self._get_dockets(),
84 "email_recipients": self._get_email_recipients(),
85 }
File ~/venvs/courtlistener/lib/python3.12/site-packages/juriscraper/pacer/email.py:387, in NotificationEmail._get_dockets(self)
381 if self.content_type == "text/plain":
382 docket_number = self._get_docket_number_plain()
383 docket = {
384 "case_name": self._get_case_name_plain(),
385 "docket_number": docket_number,
386 "date_filed": None,
--> 387 "docket_entries": self._get_docket_entries(),
388 }
389 dockets.append(docket)
390 # Cache the docket number for its later use.
File ~/venvs/courtlistener/lib/python3.12/site-packages/juriscraper/pacer/email.py:471, in NotificationEmail._get_docket_entries(self, current_node)
464 case_url = self._get_case_anchor(current_node)
466 if description is not None:
467 entries = [
468 {
469 "date_filed": self._get_date_filed(),
470 "description": description,
--> 471 "short_description": self._get_short_description(),
472 "document_url": document_url,
473 "document_number": document_number,
474 "pacer_doc_id": None,
475 "pacer_case_id": None,
476 "pacer_seq_no": None,
477 "pacer_magic_num": None,
478 }
479 ]
480 if document_url is not None:
481 entries[0]["pacer_doc_id"] = get_pacer_doc_id_from_doc1_url(
482 document_url
483 )
File ~/venvs/courtlistener/lib/python3.12/site-packages/juriscraper/pacer/email.py:618, in NotificationEmail._get_short_description(self)
613 subject = clean_string(self.subject)
614 for case_name in self.case_names:
615 # cases_names is a list of strings that can contain one or multiple
616 # elements in multi-docket NEF where the case_name referenced in the
617 # subject might change. This find the right case_name match.
--> 618 subject_split_case_name = subject.split(case_name)
619 if len(subject_split_case_name) > 1:
620 break
ValueError: empty separator
By the way, I wrote a wiki entry on how to find these
Thanks @grossir. Are you working on the fix or are you saying something more is needed to deal with this?
Thanks for the wiki page too. Super helpful. As we/you write more of these, we can also think about ways of linking them to the full wiki, so people can find them more easily.
Just gotta stay on top of these in case this is a new format we haven't seen yet.
Sentry Issue: COURTLISTENER-44Z