freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
357 stars 106 forks source link

Fill `colo` gaps #974

Open grossir opened 6 months ago

grossir commented 6 months ago

Part of #929

We will implement a dynamic backscraper for colo to help solve this

colo

0 documents between June 5th, 2021 and January 30, 2022. I have identified some missing opinions, but they are not many

Between June 21, 2022 and January 09, 2023 we have 0 documents. There are documents for this time period on the source

Note that coloctapp has gaps but will need a different backscraper, most likely parsing PDFs

grossir commented 5 months ago

After some review I think the source we are scraping may not be complete

Taking 2023 as an example, I counted all documents from the "Dispositions" tab published on 2023. There are 33 rows, some of them being Orders (which we skip).

We have 28 opinions on Courtlistener for that period. Comparing CL with the source, I have not identified a gap for 2023.

However, on a complementary source (listed on the court's page) from the Colorado Bar there are 58 results for 2023. I noticed this "gap" when checking the neutral citation serial numbers for possible gaps.

Indeed, in the "Case announcements" page / pdfs there are opinions that do not appear in the page we scrape, which leads me to think the source we are scraping is a limited one. For example, currently the most recent opinion in our source is dated March 25, 2024, but on the April 22, 2024 case announcements there are 2 published opinions.

Perhaps our source is limited because of what it says in its title: Original Proceedings Pursuant to C.A.R. 21 in the Colorado Supreme Court, and it only lists a subset of published opinions

@flooie

For year 2022, the Colorado Bar lists 50 opinions, we only have 18 . This is the last year to compare, it only has opinions from 2022 to present