Open anseljh opened 1 year ago
Documenting a very helpful chat with @johnhawkinson:
The CL data model doesn't accommodate the subcases that multi-defendant criminal cases have.
In these cases, there is a main docket, and then N additional subcases, one for each of the N defendants. A document is filed in 1 to N of the subcases, but is always filed in the main case.
Often, but not always, the main docket will have the lowest PACER ID, and the subcases will have immediately-consecutive PACER IDs. However, this is not always the case, and should not be depended upon. For example, if defendants are added in a superseding indictment, their subcases' PACER IDs may be thousands off.
PACER instances have an endpoint that can help clear this up. It's a free query, and is at cgi-bin/possible_case_numbers.pl?(case_number)
.
https://ecf.nyed.uscourts.gov/cgi-bin/possible_case_numbers.pl?1:09-cr-00466 yields:
<request number='1:09-cr-00466'>
<case number='1:09-cr-466' id='294048' title='1:09-cr-00466-BMC-RLM USA v. Beltran-Leyva et al' defendant='0'
sortable='1:2009-cr-00466-BMC-RLM' />
<case number='1:09-cr-466-1' id='294049' title='1:09-cr-00466-BMC-RLM-1 Arturo Beltran-Leyva' defendant='1'
sortable='1:2009-cr-00466' />
<case number='1:09-cr-466-2' id='294050' title='1:09-cr-00466-BMC-RLM-2 Hector Beltran Leyva' defendant='2'
sortable='1:2009-cr-00466' />
<case number='1:09-cr-466-3' id='294051' title='1:09-cr-00466-BMC-RLM-3 Ignacio Coronel Villareal' defendant='3'
sortable='1:2009-cr-00466' />
<case number='1:09-cr-466-4' id='294052'
title='1:09-cr-00466-BMC-RLM-4 Joaquin Archivaldo Guzman Loera (closed 07/18/2019)' defendant='4'
sortable='1:2009-cr-00466' />
<case number='1:09-cr-466-5' id='294053' title='1:09-cr-00466-BMC-RLM-5 Ismael Zambada Garcia' defendant='5'
sortable='1:2009-cr-00466' />
<case number='1:09-cr-466-6' id='294054' title='1:09-cr-00466-BMC-RLM-6 Jesus Zambada-Garcia (closed 01/24/2013)'
defendant='6' sortable='1:2009-cr-00466' />
</request>
The case number part of the possible_case_numbers.pl
endpoint is case-sensitive.
Unless there's a reason to follow a specific defendant only, then it looks like the reasonable thing to do is:
possible_case_numbers.pl
endpoint on the PACER instance.defendant='0'
will be the main docket with all the documents.Docket
ID.The case number part of the possible_case_numbers.pl endpoint is case-sensitive.
I guess we can make sure that Juriscraper just lower-cases any input folks give, though I don't think I've noticed this being a problem. It'd be a very easy fix though around here:
https://github.com/freelawproject/juriscraper/blob/main/juriscraper/pacer/hidden_api.py#L112-L118
Unless there's a reason to follow a specific defendant only, then it looks like the reasonable thing to do is: {detailed plan}
Yes, I think that's a solid course of action. Sorry that our data model doesn't support this. It's one of those things that came with the migration from the Princeton code, and I didn't catch it until later, at which point I've never been able to find time to deal with it. It doesn't help that most of our commercial clients are focused on civil cases.
I have notes somewhere about how to fix it, but I recall it being particularly thorny.
BCB1 followed a case it called
US v. Joaquin Guzman
, and recorded its case number in the E.D.N.Y. as1:09-cr-00466-4
. (BCB1 JSON line 92) It appears to be pertinent that this is a multi-defendant criminal case; hence the-4
at the end of the case number: Brad had been tracking defendant 4's permutation of the docket.I don't think CourtListener expressly tracks these defendant numbers in its case/docket numbers. When I search for that exact case number, including the
-4
, I get no results. When I trim the-4
off, though, I get 7 results. What's further interesting here is that those 7 results have consecutive PACER IDs. Here is the mapping between CourtListener docket IDs and PACER IDs:While the CL IDs are dispersed, the PACER IDs are consecutive, ranging from
294048
to294054
.Looking at what the CL API returns in terms of docket numbers for these CL IDs, they are all the same:
Looking at the 7 dockets themselves on CL, a variety of documents have and have not been purchased in each.
How would I determine which one to follow? Perhaps the lowest PACER ID?
Related question: What is the meaning of the
source
field on a RECAPDocket
item, e.g., this:There are quite a few of these cases in the BCB1 case list. The Guzman case appears to be illustrative, though.
Tagging @mlissner and @flooie for initial thoughts. Thanks!