freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
341 stars 98 forks source link

Fill `ny` New York Court of Appeals gaps #943

Open grossir opened 5 months ago

grossir commented 5 months ago

This is part of #929

Missing around 100 documents

Between April 27, 2018 and February 13, 2019 we have 1 document1. We are missing 92 documents

Between June 16, 2023 and October 18, 2023 we have 0 documents. We are missing 2 documents

Between June 04, 2021 and October 06, 2021 we have 0 documents. We are missing 5 documents

However, the current ny backscraper at juriscraper/opinions/united_states_backscrapers/state/ny.py is pointing to this URL: https://iapps.courts.state.ny.us/lawReporting/Search?searchType=opinion which seems to not hold the same data as the scraper source https://www.nycourts.gov/ctapps/Decisions/2024/Feb24/February24.html which has a past search interface at https://iapps.courts.state.ny.us/lawReporting/CourtOfAppealsSearch

grossir commented 3 months ago

The backscraper PR also allows filling gaps for nyappterm, since the same class is used

Between June 15th, 2020 and February 2nd, 2023 we have 5 documents in CL. From the source this amounts to more than 900 documents: 399 from Appellate Term, 1st Dept, and more than 500 from 2nd Dept.

grossir commented 2 months ago

Commands to fill the gaps

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states_backscrapers.state.ny --backscrape --backscrape-start=04/26/2018 --backscrape-end=02/12/2019
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts 
 juriscraper.opinions.united_states_backscrapers.state.ny --backscrape --backscrape-start=06/15/2023 --backscrape-end=10/19/2013
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts 
 juriscraper.opinions.united_states_backscrapers.state.ny --backscrape --backscrape-start=06/04/2021 --backscrape-end=10/05/2021