freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

Fill `or` 2019 - 2023 gap #1225

Closed grossir closed 1 month ago

grossir commented 1 month ago

Part of #929

We deleted the clusters in this time range because they were corrupted: none had linked opinions We have 0 documents between August 9th, 2019 and October 11th, 2023

Expecting around 351 opinions

./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.or --backscrape-start=2019/09/08 --backscrape-end=2023/10/12 --verbosity 3 --backscrape-wait 60
grossir commented 1 month ago

Backscraper was ran, we got 211 opinions for the time period

Note that we skip a bunch of documents that look like:

INFO Now downloading case page at: https://cdm17027.contentdm.oclc.org/digital/api/search/collection/p17027coll3/searchterm/20220823-20220906/field/dated/mode/exact/conn/and/maxRecords/200
INFO Skipping row 'Petitions for review, September 1, 2022'

INFO Backscraping for range 2022-03-26 2022-04-09
INFO Now downloading case page at: https://cdm17027.contentdm.oclc.org/digital/api/search/collection/p17027coll3/searchterm/20220326-20220409/field/dated/mode/exact/conn/and/maxRecords/200
INFO Skipping row 'Miscellaneous Supreme Court dispositions, April 7, 2022'