freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

Fill `vtsuperct` gaps #996

Closed grossir closed 3 months ago

grossir commented 7 months ago

Part of #929

Between June 05, 2020 and March 31st, 2022, we have 4 documents. We are missing ~175 documents from civil, criminal, family and environmental courts.

Between November 16th, 2017 and October 25th, 2019 we have 3 documents. We are missing around 375 documents from civil, criminal, family and environmental courts. Note that these are not only opinions, but orders and decisions too. The scraper does not filter them

Between March 23, 2017 and August 18, 2017 we have 0 documents.

Between January 17, 2020 and May 15, 2020 we have 0 documents.

grossir commented 7 months ago

Environmental and Civil sub courts have the most data. I have condensed the date ranges so as to make it easier for Ramiro to run

The commands to fill these gaps

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_environmental.py --backscrape-start=01/16/2020 --backscrape-end=04/01/2022 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_environmental.py --backscrape-start=03/23/2017 --backscrape-end=10/26/2019 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil.py --backscrape-start=01/16/2020 --backscrape-end=04/01/2022 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil.py --backscrape-start=03/23/2017 --backscrape-end=10/26/2019 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_family.py --backscrape-start=01/16/2020 --backscrape-end=04/01/2022 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_family.py --backscrape-start=03/23/2017 --backscrape-end=10/26/2019 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_probate.py --backscrape-start=01/16/2020 --backscrape-end=04/01/2022 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_probate.py --backscrape-start=03/23/2017 --backscrape-end=10/26/2019 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vt_criminal.py --backscrape-start=01/16/2020 --backscrape-end=04/01/2022 --verbosity 3

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vt_criminal.py --backscrape-start=03/23/2017 --backscrape-end=10/26/2019 --verbosity 3
grossir commented 3 months ago

Had to add more runs for civil due to some calls exceeding the page size of 25. But now everything is complete, close to the estimates except for duplicates

./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil --backscrape-start=07/02/2017 --backscrape-end=08/08/2017 --verbosity 3
./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil --backscrape-start=10/10/2017 --backscrape-end=11/01/2017 --verbosity 3
./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil --backscrape-start=01/16/2018 --backscrape-end=01/23/2018 --verbosity 3
./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil --backscrape-start=04/29/2018 --backscrape-end=05/25/2018 --verbosity 3
./manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.vtsuperct_civil --backscrape-start=08/05/2018 --backscrape-end=08/10/2018 --verbosity 3