MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
7 stars 0 forks source link

Prototype tiger pipeline #161

Closed MattTriano closed 1 year ago

MattTriano commented 1 year ago

Closes #160

While working through implementation, I entertained the idea of funneling all TIGER dataset vintages of a given entity-class (e.g. tracts, block_groups, counties, roads, rails, landmarks, waterways, etc) into a single entity-class-specific data_raw table, but ultimately opted instead to create one table per entity-class-vintage (at least in the data_raw schema). I know the columns in the tables for at least the main geographic units changed from the 2010 census to the ACS deliveries from 2011-2019 (maybe it started in 2014? I don't recall off the top of my head), and as I'm using an ELT workflow, I'd rather not start Transforming data before it's initially Loaded. Maybe down the line, I'll opt to funnel all entity-vintages into an entity-specific table, but I'm not sure yet.

In general, I'm not sure how I want to handle the Census's vintaging scheme (with regards to how data is organized in tables in the warehouse). I think I'm going to just have to try out an implementation, discover the pain points, and re-implement if those points are sufficiently painful.