Closed td928 closed 2 years ago
Sam just sent me the denominator files. Working on adding them now. 2017 HVS Denominator.xlsx
Just want to note the original spreadsheet shared by Chris for denominator is missing the puma field in one of those tabs. It makes it considerably more complicated to deal with the programmatically to create the output we want so I am changing the original spreadsheet by adding the puma in the second tab. Just want to flag this for everyone.
I would just assume my results are wrong? Is it worth digging into?
I would just assume my results are wrong? Is it worth digging into?
We're not short of work so leave it for now, but it will be something we'll likely troubleshoot together once it's confirmed that we will need to switch data sources from ACS to HVS and/or more than one year of data will need to be included in EDDT.
Ok neat. I know I said I was interested to see if my results were correct but at this point I'm more interested in getting the EDDT deliverables out the door and working through a new data product
Sorry for keep adding to this work. I realize the column needs reordering and also change the function name to something more explicit. It should be ready for final review now.
It turns out in order to incorporate the denominator spreadsheet there is significant new work added and refactoring. Here are what changes were made.
since I could not come up with a way to pre-join the denominator sheet and indicator sheet together before extract the results for different geography level. I decided that I would need to create two dataframes from the two sheets and rejoined them together after extraction are completed. Another thing to note is the denom
flag makes sure the right columns to drop depending processing which dataframe since denominator do not have Percent SE.
Now the two indcator functions simply rejoin the denom and indicator dataframes created and then start doing the final column change. Since the columns number are small. There is no custom function created for the purpose but simply spell out in the ingestion process.
@SashaWeinstein @AmandaDoyle @mbh329
It appears that the three or more maintenance deficiencies get it's denom info from the "2017 HVS Occupied Units" tab in the "2017 HVS Denominator.xlsx" file. That sheet doesn't say anything about maintenance deficiencies, it's tab names and headers talk about occupied units. Are sure this is the right source data for denoms of this indicator?
@td928 @SashaWeinstein the data matrix mentions that occupied units should be the denominator
I'd like to take a look at this again on monday with a fresh set of eyes
Theres a couple other issues going on, did the cv and moe columns get mixed up?
Hey @SashaWeinstein @mbh329 Yours comments should be all addressed. I am sorry that linting just still not working on my machine so the linting changes from your side went back to its original state. I will try to find some time to sort that out and hopefully have linting going forward.
This new work should actually be simpler. The unnecessary columns are dropped at the beginning of the ingestion and cleanning process so the denom flag is now removed and now the moe and cv should be correct. Then the columns are recast as appropriately numerical so no more quotes around the fields.
Please let me know if those changes make sense and address your concerns. Thank you!
@SashaWeinstein Should be good to go now. Let me know what you think. Thanks!
wait sorry shoot one more thing. order does matter. It should go count then count moe then count cv
hey sorry I didn't mean to push b80ef4e, undid changes with second commit
Okay so no generalizable function was written because trying to handle both the reordering for denom and indicator itself it challenging with different measurements with the existing Frankenstein approach to the creating indicator in the first place.
Rent Stablisation and Three or More Mainteinances
Based on the new spreadsheet from HPD for new rent stablisation and rework the three maintenance indicator.
Functions
So the two functions for the two indicator and both take in geography as an input to produce the results. This work is really similar to the other housing security work. The only difference is where the column name change happens to accommodates the specificities of the source data.
PUMA_helpers
small changes to the puma helpers function to handle null cases in puma column
Review
Unfortunately this results do diverge quite significantly from the completed works from @SashaWeinstein based on the borough level results in that branch. I do want observe while the results are off by some factor between 2 to 3 but the trend in neighborhood seems to hold in Sasha's results