freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

Separate `kan` and `kanctapp`, and correct `precedential_status` #1232

Open grossir opened 2 weeks ago

grossir commented 2 weeks ago

From the parent issue #1222 the kan and kanctapp scrapers were creating the same request, to an unfiltered page that mixed up both courts, and both published and unpublished opinions.

The kan scraper ingested "Court of Appeals" opinions, which should be re-assigned to kanctapp.

I think the easiest way is to delete everything since the start of the problem, and then backscrape for that time period; since we got at most 12 different types of mixups (for each of the 4 correct court-status combinations, we have 3 possible incorrect combinations), which makes a manual correction harder

Which are the ids of the incorrect dockets - clusters - opinions; or since when has this bug been ocurring?