I had an opportunity to review the results of Phase 1 with the client (see Banking Dashboard Phase 1 Summary). He was very pleased with the work done thus far and has given us the insight we need to move forward with Phase 2.
In this issue, you will update the helper functions so that they include the following data transformations / cleaning exercises.
Data transformations / cleaning
Update the data ingestion helper functions as described below.
General (all data sources, where applicable, except for the institutions data)
Standardize census tract format across data sources. Use format_census_tract helper function at the end of the helper_fcns.py script to convert from the format that has implied decimal points.
For a few data sources, I added some logic for filtering down to Texas (and to our 3 counties of interest where applicable). I would like you to move this logic into the body of the helper function. We want the output of the helper function to be ready for analysis and not require additional filtering.
SBA data
Filter on Approval date in 2022, make it consistent with HMDA
Use borrower address as the location of the small business. Filter down to borrower addresses in Texas and, specifically, the 3 counties of interest (Collin, Dallas, and Tarrant)
CRA data
Income group in cra2021_Discl_D6.dat should be recoded using the data dictionary
Background
Data transformations / cleaning
Update the data ingestion helper functions as described below.
institutions
data)format_census_tract
helper function at the end of thehelper_fcns.py
script to convert from the format that has implied decimal points.cra2021_Discl_D6.dat
should be recoded using the data dictionary