freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

RECAP should track internal docket sequence numbers (de_seqno) #247

Closed johnhawkinson closed 5 years ago

johnhawkinson commented 6 years ago

CMECF docket entries have several attributes. They include a Case ID, Docket Entry Number, DLS/doc1 number, and a sequence number (variously known as de_seqno or de_seq_num).

The de_seqno is a monotonically increasing number that appears to reflect the true ordering of both publicly visible and non-publicly visible events. It is available from parsing the goDLS() function in the Javascript of the docket report, as well as in URLs from RSS feeds and NEFs.

Tracking de_seqno is important to being able to handle docket entry number-less events (such as so-called "minute orders" that lack at attached document; note that some minute orders in some districts have docket entry numbers, but others in other districts do not). There's a strong desire to support such numberless entries expressed in #54 (duped in #219). I also talked about it in https://github.com/freelawproject/courtlistener/issues/772 where I noted that documents can be backdated and there's a desire to be able to display the true ordering, just as CMECF does.

Fundamentally, no database tracking CMECF is complete without also tracking de_seqno, since it's part of the data model.

Mike and I discussed this extensively between Nov. 2017 and Jan. 2018, but I failed to memorialize those discussions in an Issue. This is such an attempt.

Note that right now we have partial support for minute orders that do carry docket numbers (but not documents). We parse them in the docket report, so we see them, e.g., in https://www.courtlistener.com/docket/6296809/26/calderon-jimenez-v-cronen/ where docket 26 is ELECTRONIC NOTICE of Hearing.Hearing set for 5/8/2018 10:00 AM, and continuing day to day, if necessary, in Courtroom 10 before Judge Mark L. Wolf. See Order 25 Associated Cases: 1:18-cv-10225-MLW, 1:18-cv-10307-MLW, 1:18-cv-10310-MLW(Bono, Christine) (Entered: 04/05/2018) (note that "ELECTRONIC NOTICE" is code for "no attached document.).

But we don't parse them in the RSS scraper, so, e.g,

<item>
<title>1:18-cv-10225 Calderon Jimenez v. Cronen et al</title>
<link>https://ecf.mad.uscourts.gov/cgi-bin/DktRpt.pl?195705</link>
<description>[Order on Motion for Leave to Appear] </description>
<guid isPermaLink="true">https://ecf.mad.uscourts.gov/cgi-bin/DktRpt.pl?195705&#x26;221</guid>
<pubDate>Mon, 07 May 2018 15:44:51 GMT</pubDate>
</item>

Appears in today's XML, and it happens to be docket entry 61, but that's not available from the XML (because there's nothing to link to):

This is what the NEF looks like:

Notice of Electronic Filing

The following transaction was entered on 5/7/2018 at 11:44 AM EDT and filed on
5/7/2018
Case Name:       Calderon Jimenez v. Cronen et al
Case Number:     1:18-cv-10225-MLW
Filer:
Document Number: 61(No document attached)

Docket Text:
Judge Mark L. Wolf: ELECTRONIC ORDER entered Granting [59] Motion for Leave to
Appear Pro Hac Vice Added Colleen M. McCullough. Attorneys admitted Pro Hac
Vice must register for electronic filing if the attorney does not already have
an ECF account in this district. To register go to the Court website at
www.mad.uscourts.gov. Select Case Information, then Electronic Filing (CM/ECF)
and go to the CM/ECF Registration Form. (Franklin, Yvonne)

The bankruptcy (bulk) data dictionary, https://www.pacer.gov/documents/bulk_data.pdf, does define the de_seqno field:

# Data Item Name CM/ECF Name CM/ECF Report Notes
23 DOCKET ENTRIES
24 Document sequence number de_seqno n/a Generated number that in combination with de_caseid (value is the same as cs_caseid) provides a unique key for the dktentry record.
25 Document number de_document_num Docket Sequentially generated number
26 Docket entry filed date de_date_filed Docket Date docket entry filed in court.
27 Docket text dt_text Docket Docket text.

Unfortunately, we don't have de_seqno for all history, we would only have it after we start saving it. So in order to use it usefully for ordering, reporting, linking documents, we would need to fake-it-up from past history. That implies modifying the schema to add both a de_seqno as well as recap_sequence_number, where the RSN is the de_seqno where we have it, and something made-up that fits the numberspace (possibly using fixed point decimals to avoid later collisions) where we lack it.


Sorry if this sounds somewhat incoherent. That's what happens when you try to summarize a months-old IM conversation quickly.

mlissner commented 6 years ago

This is a great summary, thank you.

mlissner commented 6 years ago

In Del Gallo v. Walsh (3:17-cv-30167), a case in Mass., @johnhawkinson has discovered that it has two documents numbered 41. Another reason to get this done.

mlissner commented 5 years ago

We're now collecting the de_seqno values and putting them in the DB. CLOSING (but reopen if needed).