freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
343 stars 98 forks source link

Scraper down (coloctapp) #1062

Closed sentry-io[bot] closed 2 weeks ago

sentry-io[bot] commented 3 weeks ago

IndexError: list index out of range

Sentry Issue: COURTLISTENER-7SD

IndexError: list index out of range
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 390, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 353, in parse_and_scrape_site
    site = mod.Site().parse()
grossir commented 3 weeks ago

It seems the old colo and coloctapp site is no longer accessible, and that they have moved their opinions into https://research.coloradojudicial.gov/ , for which we already have a PR in progress #1011

flooie commented 2 weeks ago

so ... do we need to just move this to the new site ?

grossir commented 2 weeks ago

Yes, it's merged already, but we haven't released juriscraper and updated CL dependencies so this issue is still open

grossir commented 2 weeks ago

So, this keeps failing from the UnexpectedContentType error. The site is now returning HTML. Since I developed this some months ago, I assumed the format was still the same; it was not.

Sentry Issue: COURTLISTENER-71F

sentry-io[bot] commented 2 weeks ago

Sentry Issue: COURTLISTENER-71F

grossir commented 2 weeks ago

Both colo and coloctapp are working now, we have fresh data

Some things that may be problems, but we should track in different issues, and solve if needed before running the backscrapers:

@flooie