freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
356 stars 106 forks source link

`bia` breaks with amended opinions #1023

Open sentry-io[bot] opened 4 months ago

sentry-io[bot] commented 4 months ago

Sentry Issue: COURTLISTENER-74W

ValueError: too many values to unpack (expected 2)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 387, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 350, in parse_and_scrape_site
    site = mod.Site().parse()

Filed by @grossir

Also, noticed that the scraper only ever checks the first row due to the xpath selectors it uses. Usually that's enough, but it produces gaps when many opinions are published between 2 scrapes

Currently missing from https://www.justice.gov/eoir/volume-28 CANCINOS-MANCIO, 28 I&N Dec. 708 (BIA 2023) GARCIA, 28 I&N Dec. 693 (BIA 2023)

CL: https://www.courtlistener.com/?q=court_id%3Abia&type=o&order_by=dateFiled%20desc&stat_Published=on

And maybe some others