freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
364 stars 109 forks source link

BVA Scraper Down #873

Open sentry-io[bot] opened 9 months ago

sentry-io[bot] commented 9 months ago

I fear this one may be down for the count.

HTTPError: 404 Client Error: Not Found for url: https://www.index.va.gov/search/va/bva_search.jsp?RPP=50&RS=... Sentry Issue: COURTLISTENER-64N

HTTPError: 404 Client Error: Not Found for url: https://www.index.va.gov/search/va/bva_search.jsp?RPP=50&RS=1&DB=2024&DB=2023&DB=2022&DB=2021&DB=2020&DB=2019&DB=2018&DB=2017&DB=2016&DB=2015&DB=2014&DB=2013&DB=2012&DB=2011&DB=2010&DB=2009&DB=2008&DB=2007&DB=2006&DB=2005&DB=2004&DB=2003&DB=2002&DB=2001&DB=2000&DB=1999&DB=1998&DB=1997
(2 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 385, in handle
    self.parse_and_scrape_site(mod, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 348, in parse_and_scrape_site
    site = mod.Site().parse()
grossir commented 9 months ago

The link we used to scrape is dead.

There is a search page, but it requires a text query, which may be unreliable: https://search.usa.gov/search/docs?affiliate=bvadecisions&sort_by=&query=%222024%22

Also, there is a sitemap and each of the nested sitemaps provide links to all opinions from 1992 to 2022, which may be used to backscrape, in case the data is of interest https://www.va.gov/sitemap_bva.xml

We could also try contacting the person in charge of Open Data for BVA. Her contact is here: https://www.data.va.gov/dataset/Board-of-Veterans-Appeals-Decisions/rw54-s8nj/about_data

flooie commented 9 months ago

Thanks @grossir - I found the same. I reached out to the stated contact for this dataset and have turned off the scraper by setting has_opinion_scraper to false for now.