Write ETL process to ingest scraped probate information into the courts database

datamade / court-scrapers

MIT License

2 stars 0 forks source link

Write ETL process to ingest scraped probate information into the courts database #7

Closed hancush closed 2 years ago

hancush commented 2 years ago

The scrape will yield nested JSON objects. We will ultimately want to take these objects and break them apart into the various tables in the database. Write a Makefile to perform these transformations.

hancush commented 2 years ago

Let's tackle this in steps.

First, write a Makefile that takes the scraped JSON files and creates three flat CSV files:

One with case information, roughly corresponding to the court_case table

Screen Shot 2022-05-10 at 1 41 51 PM

One with defendant information, roughly corresponding to the defendant table

Screen Shot 2022-05-10 at 1 42 07 PM

One with a record for each action, roughly corresponding to the docket_event table

Screen Shot 2022-05-10 at 1 42 20 PM

Exact files subject to change based on feedback from @fatima3558 and @fgregg. Namely, how does probate court work, and does it map onto the criminal division definitions of cases, defendants, and docket events? Are there charges or sentences? Who is the "participant" listed on case activities? Are there other tables we should populate?

fgregg commented 2 years ago

Dockets are very similar
The Parties are a little different. You will always have the estate, which is probably a single table. Then you can have multiple claimants on that estate. I'd start by making an estate table and a "parties" table and then we can kind of of sort out what kind of parties there are. (The estate info might also be able to just live in an overall case table)

So, i think a three or four table structure to start with

dockets
case
estate (which might be able to be combined with case)
parties

hancush commented 2 years ago

Ok, @fatima3558, so we'll go with three tables:

estate_case with info about estates and cases
estate_docket_event for info about actions for each case
parties, for people (not judges) associated with each case

Can you take the first shot at grouping the scraped information into these files?