Closed grossir closed 1 month ago
The backscraper got run, we now have 1533 opinions for the mentioned time period. Note that there are duplicates, for example:
INFO Duplicate found on date: 2024-09-27, with lookup value: 7352946c76a21b1ce4f0bd0d215c0da3d9424a34
INFO Duplicate found on date: 2024-10-03, with lookup value: d1676b0110227c73ee3f02f6b693adc01da30493
These 4 documents which failed to download
https://ojd.contentdm.oclc.org/digital/api/collection/p17027coll6/id/3584/download https://ojd.contentdm.oclc.org/digital/api/collection/p17027coll6/id/7256/download https://ojd.contentdm.oclc.org/digital/api/collection/p17027coll6/id/7442/download https://ojd.contentdm.oclc.org/digital/api/collection/p17027coll6/id/4183/download
./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.ortc --backscrape-start 2012/03/25 --backscrape-end 2012/03/27 --verbosity 3
./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.ortc --backscrape-start 2012/12/30 --backscrape-end 2013/01/01 --verbosity 3
./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.ortc --backscrape-start 2016/09/12 --backscrape-end 2016/09/14 --verbosity 3
./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.ortc --backscrape-start 2018/02/01 --backscrape-end 2018/02/03 --verbosity 3
Sentry Issue: COURTLISTENER-8BP
'https://ojd.contentdm.oclc.org/digital/api/collection/p17027coll6/id/3584/download' 'text/html' not in ['application/pdf']
Just ran the commands on the previous comments; no gaps left!
It would use the same scraper as the recently merged
or
scraper We have data forortc
up to December 14th, 2011, so we would ingest 1969 opinions from then to today; and we will get the current opinions on a regular basisOnce the PR is merged, we need to tick the
has_opinion_scraper
flag https://www.courtlistener.com/admin/search/court/ortc/change/Command to backscrape