datamade / court-scrapers

MIT License
2 stars 0 forks source link

Create nightly scraping Makefile #21

Closed antidipyramid closed 9 months ago

antidipyramid commented 9 months ago

Overview

See title.

The new scripts in this PR are scripts/nightly_civil_start.sql and scripts/nightly_chancery_start.sql. Those scripts output the serial number (e.g. 00001) of the last sequential case number in the database for a particular court and subdivision. We'll use this number to scrape any new cases that have been uploaded to the site.

Closes #15

Notes

The yearly scrape

Previously, the Makefile sought to scrape all possible records from the chancery and civil courts in sequence by year at one time. I removed this approach in this PR but we may want to keep/bring it back to build the initial database.

Testing Instructions

(for ease of local testing, you can reduce the amount of .jl case files required in the Makefile and reduce the end parameters for subdivisions stored in courtscraper/spiders/civil.py and courtscraper/spiders/chancery.py)