NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

42 rent stablized three mainteinance #211

Closed td928 closed 2 years ago

td928 commented 2 years ago

Rent Stablisation and Three or More Mainteinances

Based on the new spreadsheet from HPD for new rent stablisation and rework the three maintenance indicator.

Functions

So the two functions for the two indicator and both take in geography as an input to produce the results. This work is really similar to the other housing security work. The only difference is where the column name change happens to accommodates the specificities of the source data.

PUMA_helpers

small changes to the puma helpers function to handle null cases in puma column

Review

Unfortunately this results do diverge quite significantly from the completed works from @SashaWeinstein based on the borough level results in that branch. I do want observe while the results are off by some factor between 2 to 3 but the trend in neighborhood seems to hold in Sasha's results

td928 commented 2 years ago

Sam just sent me the denominator files. Working on adding them now. 2017 HVS Denominator.xlsx

Just want to note the original spreadsheet shared by Chris for denominator is missing the puma field in one of those tabs. It makes it considerably more complicated to deal with the programmatically to create the output we want so I am changing the original spreadsheet by adding the puma in the second tab. Just want to flag this for everyone.

SashaWeinstein commented 2 years ago

I would just assume my results are wrong? Is it worth digging into?

AmandaDoyle commented 2 years ago

I would just assume my results are wrong? Is it worth digging into?

We're not short of work so leave it for now, but it will be something we'll likely troubleshoot together once it's confirmed that we will need to switch data sources from ACS to HVS and/or more than one year of data will need to be included in EDDT.

SashaWeinstein commented 2 years ago

Ok neat. I know I said I was interested to see if my results were correct but at this point I'm more interested in getting the EDDT deliverables out the door and working through a new data product

td928 commented 2 years ago

Sorry for keep adding to this work. I realize the column needs reordering and also change the function name to something more explicit. It should be ready for final review now.

It turns out in order to incorporate the denominator spreadsheet there is significant new work added and refactoring. Here are what changes were made.

extract_geography_dataframe

since I could not come up with a way to pre-join the denominator sheet and indicator sheet together before extract the results for different geography level. I decided that I would need to create two dataframes from the two sheets and rejoined them together after extraction are completed. Another thing to note is the denom flag makes sure the right columns to drop depending processing which dataframe since denominator do not have Percent SE.

indicator functions

Now the two indcator functions simply rejoin the denom and indicator dataframes created and then start doing the final column change. Since the columns number are small. There is no custom function created for the purpose but simply spell out in the ingestion process.

@SashaWeinstein @AmandaDoyle @mbh329

SashaWeinstein commented 2 years ago

It appears that the three or more maintenance deficiencies get it's denom info from the "2017 HVS Occupied Units" tab in the "2017 HVS Denominator.xlsx" file. That sheet doesn't say anything about maintenance deficiencies, it's tab names and headers talk about occupied units. Are sure this is the right source data for denoms of this indicator?

mbh329 commented 2 years ago

@td928 @SashaWeinstein the data matrix mentions that occupied units should be the denominator

mbh329 commented 2 years ago

I'd like to take a look at this again on monday with a fresh set of eyes

mbh329 commented 2 years ago

Theres a couple other issues going on, did the cv and moe columns get mixed up?

td928 commented 2 years ago

Hey @SashaWeinstein @mbh329 Yours comments should be all addressed. I am sorry that linting just still not working on my machine so the linting changes from your side went back to its original state. I will try to find some time to sort that out and hopefully have linting going forward.

This new work should actually be simpler. The unnecessary columns are dropped at the beginning of the ingestion and cleanning process so the denom flag is now removed and now the moe and cv should be correct. Then the columns are recast as appropriately numerical so no more quotes around the fields.

Please let me know if those changes make sense and address your concerns. Thank you!

td928 commented 2 years ago

@SashaWeinstein Should be good to go now. Let me know what you think. Thanks!

SashaWeinstein commented 2 years ago

wait sorry shoot one more thing. order does matter. It should go count then count moe then count cv

SashaWeinstein commented 2 years ago

hey sorry I didn't mean to push b80ef4e, undid changes with second commit

td928 commented 2 years ago

Okay so no generalizable function was written because trying to handle both the reordering for denom and indicator itself it challenging with different measurements with the existing Frankenstein approach to the creating indicator in the first place.