Bankruptcy docket document IDs don't line up with attachment IDs (in PACER!)

mlissner commented 6 years ago

So if you look at entry 88 on this docket:

https://ecf.nysb.uscourts.gov/cgi-bin/DktRpt.pl?166160

You'll see that it has a link to:

https://ecf.nysb.uscourts.gov/doc1/12606715934

And so, we'd assume it's doc ID is 12606715934. But if you click that link and go to the attachment page, you'll get a list of seven documents. The first is described as "Main Document", and has a link of:

https://ecf.nysb.uscourts.gov/doc1/12616716066

And so you'd assume it's ID is 12616716066.

BUG: This is not the same value as it has on our docket.

If you go down the attachment page, eventually you get to item six (!), which has a link of:

https://ecf.nysb.uscourts.gov/doc1/12616715934

This is...very weird. I've seen this in a few other places too, and I'm beginning to think it's universal. Our RECAP merge fails on all of these because it assumes the first item's ID can be used as a lookup for the docket entry to associate with. When that fails, it kind of gives up and declares it an orphan.

Here's a fun question though:

On the docket page in PACER, the main document appears to have an ID of '934
On the attachment page in PACER, it has an ID of '066.
On CL, we don't separate these pages out. So...what do we show on our docket page?

I think the solution is that we show the same link as PACER until we get the attachment page and realize we can do better. Once that happens, we show the correct links to the correct things.

johnhawkinson commented 6 years ago

Our RECAP merge fails on all of these because it assumes the first item's ID can be used as a lookup for the docket entry to associate with. ... I think the solution is that we show the same link as PACER until we get the attachment page and realize we can do better. Once that happens, we show the correct links to the correct things.

As is often the case, to me the answer is "make fewer assumptions." Don't assume there is necessarily any correlation to the DLS number of an attachment page and its first document. Generally they'll be the same, but not always.

It's interesting to note that on the attachment page, not only does the first doc not match the link from the attachment page, but the DLS numbers are not consecutive, and are not even monotonic. Attachment 6 has the lowest numbers ,and here are the DLS numbers in relation to it:

Part	DLS	DLS minus 12616715934
1	12616716066	132
2	12616716556	622
3	12616716237	303
4	12616716463	529
5	12616717099	1165
6	12616715934	0
7	12616716376	442

One way this could happen is if the documents were replaced later. But then I'd expect the DLS numbers to be in bimodal clusters or something.

The other day I saw a "main document" that was called something like "Revised Main Document" or "Replaced Main Document." I failed to note what case it was in, and I think it may have been a restricted document (maybe in ecf.mad's 13-cv-30125?) but that may also be food for thought.

This is...very weird. I've seen this in a few other places too, and I'm beginning to think it's universal

I think I don't understand what "universal" means here?

When that fails, it kind of gives up and declares it an orphan.

I don't know that this is the same problem as orphaned dockets, but it does seem like we need better orphan handling more broadly.

mlissner commented 6 years ago

it does seem like we need better orphan handling more broadly.

Probably so. I think part of my sub-or-semi-conscious thinking on orphans was that they're better than mismatches, and so I erred on the side of orphaning instead of mismatching when I wasn't sure what to do. As we're getting more experienced I'm feeling more confident in doing this correctly and I think we can probably start doing better.

I think I don't understand what "universal" means here?

Just that I think this may affect a LOT of attachment pages on bankruptcy, maybe even all of them, "universally."

Thanks for the rest of your analysis. Useful.

freelawproject / courtlistener

Bankruptcy docket document IDs don't line up with attachment IDs (in PACER!) #851