NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

ELI households #55

Open AmandaDoyle opened 2 years ago

AmandaDoyle commented 2 years ago

Description: Number of households that are <30% AMI

Logic:

Source(s): ACS PUMS

Year(s):

Geographies:

Race and ethnicity breakdown: No Denominator: total households

SME: HPD/DCP Population Division

Questions:

Roadblocks:

td928 commented 2 years ago

2022.2.4 Te's Work Journal

modules testing remain to do

the new module testing cannot complete because PUMSEcnomics indicator weight calculation is broken for some reason. Need further investigation before concluding the new assign modules works

Check out this branch for the assigning income band functionalities.

Solution (to dos)

this will involve create new ingest function (similar to one below) which after person level PUMS is ingested to transform the dataframe to keep only one member of the household as the representative. Since household income (HINCP) & number of persons in family (NPF) are identical across different members. This should be fine. However, after some more considerations, I think the better approach could be simply passing a boolean flag for household to the number of functions involved. I will detail further below my thoughts about this.

Alternative Solution (to dos)

Before I go into more details on how to make the specific change. I want to summarize and list all the involved functions for creating the household workflow. Under ingest, load_data.py and make_cache_fn.py.Then under ingest/PUMS/, PUMS_data.py. Then, the upstream functionalities which calls load_data will also need to get updated for the entire workflow to run. They are the under aggregate/PUMS aggregate_PUMS.py and count_PUMS_economics.py.

load_data.py

In initializing the load_PUM function, a new boolean flag 'household' to be passed to the function. Then the flag will be passed onto both the get_cache_fn and also where ingestor is created by calling the PUMS_Data initializer.

https://github.com/NYCPlanning/db-equitable-development-tool/blob/b388e26885e544def087a24a6f44157074078678/ingest/load_data.py#L31-L39

And probably preferably, to use the boolean flag also create a household string which can be inserted in front of the . what I have in mind: hh = '' if self.household == False else 'household ' logger.info(f"{hh}PUMS data with {PUMS_data.shape[0]} records loaded, ready for aggregation")

make_cache_fn.py

this would be a simply change where the boolean household flag would be passed and additional string to mark the cache as household data would be added.

PUMS_Data

I think this is sensible place to accomodate the need for ingesting household level data. After receiving the household boolean flag from load_data's call, it would only need to modify the populate_data_frame function which I think could take place after this line

https://github.com/NYCPlanning/db-equitable-development-tool/blob/b388e26885e544def087a24a6f44157074078678/ingest/PUMS/PUMS_data.py#L79

A simple data = data.loc[data["SPORDER"] == '1'] should do the job of leaving just one person per household in the dataframe.

Upstream work

aggregate_PUMS.py

Only change need to be made here is adding the household boolean flag as an additional parameter the PUMSAggregator initialization takes and also inserts the flag in the call to the load_PUMS function.

https://github.com/NYCPlanning/db-equitable-development-tool/blob/b388e26885e544def087a24a6f44157074078678/aggregate/PUMS/aggregate_PUMS.py#L50-L60 and ideally a household string marker should be added to log info to make this more explicit like the ingestion step above

https://github.com/NYCPlanning/db-equitable-development-tool/blob/b388e26885e544def087a24a6f44157074078678/aggregate/PUMS/aggregate_PUMS.py#L73-L75

count_PUMS_economics.py

last but not least, adding the household flag as parameter the countPUMSEconomics class. And also to the PUMCount initialization which kicks off the aggregator class.

https://github.com/NYCPlanning/db-equitable-development-tool/blob/b388e26885e544def087a24a6f44157074078678/aggregate/PUMS/count_PUMS_economics.py#L22-L33

Weight for household

If the above solution can be implemented, the last outstanding question is about the treatment of weights on household level. Based on the manual posted in this calculate count script, I am lead to believe the household weight is handled adequately by current calculate_counts.py. The reference manual Sasha mentioned is here