Create sca_e_projections by boro

AmandaDoyle commented 4 years ago

Aggregate HS data to boro level

Create a new folder named sca_e_projections_by_boro under the ceqr_app_date/ceqr/recipes directory
Create config.json, README.md, build.py, and requirements.txt
use sca_e_projections.2019 as the input data

target schema @bfreeds please confirm the schema for this CEQR table

CREATE TABLE sca_e_projections_by_boro."2019" (
school_year integer,
borocode integer,
hs integer
);

build its ETL pipeline using sca_e_projections as an example
- convert the input table from a wide table to a long table using a pandas syntax called melt
- calculate the hs projections by summing up projected = [9,10,11,12], don't use projected = '9-12 Total' from sca_e_projections.2019 which has incorrect values
- write a python function or dictionary to map district to its matchingborocode
- aggregate the data to boro level
export the output to EDM_DATA
push your code to another branch

bfreeds commented 4 years ago

@AmandaDoyle following the schemas for 2017 and 2018 would be:

CREATE TABLE sca_enrollment_projections_by_boro."2019" (
    year varchar,
    borough text,
    hs integer
);

baolingz commented 4 years ago

@bfreeds @AmandaDoyle We've refactored the ETL for 2018 and 2019 tables based on the above schema using SCA's source data.

bfreeds commented 4 years ago

@baolingz copy, thank you! Apologies for my delayed response, Github's new notifications management (which is waaay more usable) helped me see that I was pinged on this issue.

NYCPlanning / ceqr-app-data-archive

Create sca_e_projections by boro #56