freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
553 stars 151 forks source link

Missing District Opinions from RECAP #4095

Open flooie opened 5 months ago

flooie commented 5 months ago

@mlissner

I checked about nine months of reports for the MAD court - to see if the missing opinion was an anomaly.

scrape_pacer_free_opinions Command

Date Range Missing/Total Percentage Missing
06/01/23 -> 07/01/23 9 / 115 7.83%
07/01/23 -> 08/01/23 8 / 111 7.21%
08/01/23 -> 09/01/23 11 / 95 11.58%
09/01/23 -> 10/01/23 11 / 142 7.75%
10/01/23 -> 11/01/23 28 / 136 20.59%
11/01/23 -> 12/01/23 12 / 110 10.91%
12/01/23 -> 01/01/24 8 / 97 8.25%
01/01/24 -> 02/01/24 15 / 124 12.10%
02/01/24 -> 03/01/24 17 / 119 14.29%
Total 119 / 1049 11.34%

A significant number of opinions that appear in the scrape_pacer_free_opinions command for at least mad appear to not be ingested into the RECAP db.

A good number of these are civil cases - I would say the majority- and they often are cases that come from transferred cases but not exclusively. A fuller examination will need to be completed.

mlissner commented 5 months ago

As a quick fix, perhaps we should just re-run the downloader starting from scratch, assuming that it skips things we already have in an efficient way. That wouldn't fix the root issue, but it'd plug some gaps.

flooie commented 5 months ago

we can certainly test it out for the last year in MAD and see if it catches everything.

mlissner commented 5 months ago

That sounds great.