freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
340 stars 98 forks source link

`mass` and `massappct` are blocking our scraping server #1048

Open grossir opened 1 week ago

grossir commented 1 week ago

For mass our most recent opinion is from May 29th, 2024, and there more recent data in the source

For massappct, we are missing the Precedential opinions published on the mass.gov site, we have the Non Precential ones that we scrape from 128 archive

The dates of our most recent opinions match the dates Sentry started registering these events

Sentry Issue: COURTLISTENER-7EP

HTTPError: 403 Client Error: Forbidden for url: https://www.mass.gov/service-details/new-opinions
(1 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 387, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 350, in parse_and_scrape_site
    site = mod.Site().parse()

Sentry Issue: COURTLISTENER-7EN

flooie commented 4 days ago

I contacted the active clerk of the SJC this morning over phone and email. I will report back when progress has been made on this front.