freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

Fix `extract_from_text` for `mass` backscraper, and re-run it #1234

Open grossir opened 2 weeks ago

grossir commented 2 weeks ago

Related to https://github.com/freelawproject/juriscraper/issues/984 To be run after the bug is fixed and https://github.com/freelawproject/courtlistener/pull/4520 is merged

Sentry Issue: COURTLISTENER-8KH

AttributeError: 'list' object has no attribute 'xpath'
  File "cl/scrapers/tasks.py", line 183, in extract_doc_content
    update_document_from_text(opinion, juriscraper_module)
  File "cl/scrapers/tasks.py", line 64, in update_document_from_text
    metadata_dict = site.extract_from_text(opinion.plain_text or opinion.html)