CivicTechAtlanta / georgia-courtbot

Helping people remember to attend court to help break the cycle of fines and jail time
4 stars 5 forks source link

Add CSV as an output format. #10

Closed abrie closed 2 years ago

abrie commented 2 years ago

This PR adds CSV output as an option to the scraper. CSV is well suited for importing into a databases.

When running the scraper, specify the output format using the '--output' argument:

python dekalb_scraper.py --output {json,csv}

Here is an example that populate the 'cases' table in an sqlite3 database:

python dekalb_scraper.py --output csv | sqlite3 database.sqlite3 ".import --csv /dev/stdin cases"

bbrewington commented 2 years ago

Could you check to see if the fields shown in issue https://github.com/codeforatlanta/georgia-courtbot/issues/5 are captured?

I noticed we're grabbing CASE_ID but not CASE_NUMBER. the latter is important b/c that's what someone is going to input in SMS / web form sign up

Thought JUDICIAL_OFFICER would be useful since we're using that for (pagination?) --> in case we need to re-run a chunk of requests at some point

bbrewington commented 2 years ago

Ran a test locally outputting to CSV, and it worked great:

python3 dekalb_scraper.py --output csv > dekalb_scrape_202201170827.csv

^ uploaded that file to Google Sheets...will share link in Slack

bbrewington commented 2 years ago

going to handle above extra fields request in a separate PR