freelawproject / juriscraper

An API to scrape American court websites for metadata.
https://free.law/juriscraper/
BSD 2-Clause "Simplified" License
341 stars 98 forks source link

Enhance Wisconsin Supreme Court scraper #815

Open flooie opened 7 months ago

flooie commented 7 months ago

Wisconsin scraper is not down.

It must have been a one off event. But in taking a second look - we should add PDF parsing to the scraped opinions as citation information .. and a neutral citation to boot is embedded at the top of the PDF.

flooie commented 7 months ago

We should add the following method to extract out citations in both Wisconsin Supreme and Court of Appeals

def extract_from_text(self, scraped_text: str) -> Dict[str, Any]:

flooie commented 6 months ago

Wisconsin should be converted from OpinionSite to OpinionSiteLinear Additionally, all PDFs appear to have a nice and neat neutral citation at the start of the document.

Image

@grossir - this is a good opportunity to update this scraper and bring around a nice addition. Citations are super important, but not all courts produce a citation with their opinion. Often they are generated by for-profit companies months or even years later.

Whenever we have the opportunity we should grab an official citation.