freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
354 stars 106 forks source link

Vermont judiciary is blocking the "Juriscraper" user agent #1140

Closed grossir closed 2 weeks ago

grossir commented 2 weeks ago

This affects vt, vtsuperct_* and vt_criminal

On a quick test, changing the user agent solves this

Sentry Issue: COURTLISTENER-82Q

HTTPError: 403 Client Error: Forbidden for url: https://www.vermontjudiciary.org/opinions-decisions?facet_from_date=&facet_to_date=&f%5B0%5D=court_division_opinions_library_%3A6
(1 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 437, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 400, in parse_and_scrape_site
    site = mod.Site().parse()
sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82Q

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82P

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82N

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82M

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82K

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-82J

mlissner commented 2 weeks ago

Impressive. The spam that showed up here an hour ago is gone. Go GitHub.

grossir commented 2 weeks ago

This is working again, we have the latest opinions https://www.courtlistener.com/?q=court_id%3Avt&type=o&order_by=dateFiled%20desc&stat_Published=on