freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
378 stars 111 forks source link

`la` scraper skipping some opinions #1195

Closed grossir closed 2 weeks ago

grossir commented 1 month ago

The XPath this scraper is using is causing it to skip some opinions

        path = (
            "//a["
            "contains(., 'v.') or "
            "contains(., 'IN RE') or "
            "contains(., 'IN THE') or "
            "contains(., 'vs.') or "
            "contains(., 'VS.')"
            "]"
        )

Although this happens rarely (could only find the case below in 2024), we should make the XPath more flexible

For example, the 2nd opinion is skipped image

grossir commented 2 weeks ago

We got the opinion from 2024 that was missing