arderyp / scotuswebcites

United States Supreme Count web citation discovery, presentation, and validation
GNU General Public License v3.0
1 stars 0 forks source link

Fix scraper #1

Closed arderyp closed 8 years ago

arderyp commented 8 years ago

The scraper broke due the to site's format change. They've added a new "Revised" column to all of the tables.

Beyond fixing the data parsing, will also need to add two new fields to the opinions object: revised (nullable), revised_pdf_url. Will also need to download and scrape said PDFs and cross compare citations. No need to track duplicate citations. They probably won't ever be different.. I believe the revised documents mostly include typo cleanup and cosmetic stuff like that.