datamade / court-scrapers

MIT License
2 stars 0 forks source link

Write ETL process to ingest scraped probate information into the courts database #7

Closed hancush closed 2 years ago

hancush commented 2 years ago

The scrape will yield nested JSON objects. We will ultimately want to take these objects and break them apart into the various tables in the database. Write a Makefile to perform these transformations.

hancush commented 2 years ago

Let's tackle this in steps.

First, write a Makefile that takes the scraped JSON files and creates three flat CSV files:

  1. One with case information, roughly corresponding to the court_case table

Screen Shot 2022-05-10 at 1 41 51 PM

  1. One with defendant information, roughly corresponding to the defendant table

Screen Shot 2022-05-10 at 1 42 07 PM

  1. One with a record for each action, roughly corresponding to the docket_event table

Screen Shot 2022-05-10 at 1 42 20 PM

Exact files subject to change based on feedback from @fatima3558 and @fgregg. Namely, how does probate court work, and does it map onto the criminal division definitions of cases, defendants, and docket events? Are there charges or sentences? Who is the "participant" listed on case activities? Are there other tables we should populate?

fgregg commented 2 years ago

So, i think a three or four table structure to start with

hancush commented 2 years ago

Ok, @fatima3558, so we'll go with three tables:

  1. estate_case with info about estates and cases
  2. estate_docket_event for info about actions for each case
  3. parties, for people (not judges) associated with each case

Can you take the first shot at grouping the scraped information into these files?