freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Duplicate docket entries (1 from rss?) #272

Open johnhawkinson opened 5 years ago

johnhawkinson commented 5 years ago

In this case, https://www.courtlistener.com/docket/6296809/calderon-jimenez-v-cronen/?page=2, it appears that RECAP has two entries for the same order. Presumably one from RSS and one from the docket report and they did not merge properly:

Mar 21, 2019;  Order on Motion for Extension of Time; <p>223; Mar 21, 2019; Judge Mark L. Wolf: ELECTRONIC ORDER entered granting 222 Motion for Extension of Time (Bono, Christine) (Entered: 03/21/2019); Main Doc­ument

"Huh."

mlissner commented 5 years ago

Hm, somehow we got that as an unnumbered entry though it seems to be a numbered entry now. That's pretty weird.

johnhawkinson commented 5 years ago

interim conclusion: we need to save raw rss for debugging. Maybe for other things too.

mlissner commented 5 years ago

Yeah, wouldn't hurt. We can open a bug for it, but I have no idea when I'll get to it.

johnhawkinson commented 5 years ago

This is not a one-off. Today's example: https://www.courtlistener.com/docket/6151669/alasaad-v-duke/?filed_after=&filed_before=&entry_gte=&entry_lte=&order_by=desc

Screen Shot 2019-04-11 at 23 17 21
johnhawkinson commented 5 years ago

And then there's this peculiar variant, where it looks a "Notice of Consent - No Consent" was edited to "All parties have consented" but both appear in the RECAP docket report, one unnumbered. I wonder if this means if we had reparsed the RSS feed after the edit (e.g. w/i 24 hours), we would have gotten the proper information.

https://www.courtlistener.com/docket/6106337/dalessio-v-university-of-washington/?filed_after=&filed_before=&entry_gte=&entry_lte=&order_by=desc

Screen Shot 2019-06-11 at 08 03 05

So the duplication is a bigger problem in the face of edits.

ikeboy commented 4 years ago

Still an issue, e.g. https://www.courtlistener.com/docket/16554647/infinity-global-consulting-group-inc-v-tilray-inc/

image image

johnhawkinson commented 4 years ago

Yes, this is, unfortunately, a regular problem. It's pretty confusing, we should prioritize fixing it, and then somehow managing to repair the months of busted data it has lead to :(

mlissner commented 4 years ago

Is there a way to fix it? The only thing these items have in common, I think, is the date?

johnhawkinson commented 4 years ago

Oh, it's certainly fixable. This problem was introduced when changes were made with respect to de_seqno and ordering, or something like that. I thought we had a conversation around the change that introduced this bug, perhaps in the Slack. In any event, it was shortly prior to my opening this Issue, so probably Feb or March 2019.