NYCPlanning / ceqr-app-data-archive

(DEPRECATED)data pipelines for CEQR app, managed by data engineering
https://github.com/NYCPlanning/ceqr-app-data
1 stars 1 forks source link

Create sca_e_projections by boro #56

Closed AmandaDoyle closed 4 years ago

AmandaDoyle commented 4 years ago

Aggregate HS data to boro level

  1. Create a new folder named sca_e_projections_by_boro under the ceqr_app_date/ceqr/recipes directory
  2. Create config.json, README.md, build.py, and requirements.txt
  3. use sca_e_projections.2019 as the input data
  4. target schema @bfreeds please confirm the schema for this CEQR table
    CREATE TABLE sca_e_projections_by_boro."2019" (
    school_year integer,
    borocode integer,
    hs integer
    );
  5. build its ETL pipeline using sca_e_projections as an example
    • convert the input table from a wide table to a long table using a pandas syntax called melt
    • calculate the hs projections by summing up projected = [9,10,11,12], don't use projected = '9-12 Total' from sca_e_projections.2019 which has incorrect values
    • write a python function or dictionary to map district to its matchingborocode
    • aggregate the data to boro level
  6. export the output to EDM_DATA
  7. push your code to another branch
bfreeds commented 4 years ago

@AmandaDoyle following the schemas for 2017 and 2018 would be:

CREATE TABLE sca_enrollment_projections_by_boro."2019" (
    year varchar,
    borough text,
    hs integer
);
baolingz commented 4 years ago

@bfreeds @AmandaDoyle We've refactored the ETL for 2018 and 2019 tables based on the above schema using SCA's source data.

bfreeds commented 4 years ago

@baolingz copy, thank you! Apologies for my delayed response, Github's new notifications management (which is waaay more usable) helped me see that I was pinged on this issue.