[x] Create a new Jupyter notebook called tabular-summaries.ipynb
[x] Insert cells that import the pre-processed datasets.
[x] After loading the HMDA data, create a new field called Race_Ethnicity that uses applicant_race_1 and applicant_ethnicity_1 to map applicants to the following categories: (1) White non-Hispanic, (2) White Hispanic, (3) Black, (4) Asian, and (5) Other, where 3,4,5 are each determined by race only (ignoring ethnicity)
[x] Insert cells that use pandas / aggregation operations to prepare summary tables with the following fields.
Notes
In the following, whenever loans are mentioned, note that we are talking about loan originations only. We are focusing on where the loans were actually approved.
Other fields will be added to these tables later. This is just to get us started for now.
Bold fields are the ones we should group by
No need to rename any fields for now. You can pull them as-is from the tables.
Table: Branch-level details
Data sources: FDIC locations
Fields:
Bank (branch?) name
Bank address
Bank city
Bank state
Bank zip code
Bank census tract ID
Bank county
Table: Census tract-level summary
Data sources: FFIEC Census, HMDA, SBA, joined by census tract ID
Possible join logic:
[HMDA left joined on FFIEC Census], and [SBA left joined on FFIEC Census/HMDA join]
Create intermediate tables that compute sums of unique identifiers within each census tract, and then left join those tables to FFIEC Census data
Note: For HMDA race/ethnicity, for now, focus on applicant_race_1 and applicant_ethnicity_1
Fields:
County name (Dallas, Collin, or Tarrant)
Census tract ID
Total persons (from FFIEC Census data)
Total families (from FFIEC Census data)
Total households (from FFIEC Census data)
Census tract income level -- low, moderate, or high? For now, just use the numeric code. We can infer the low/moderate/high value later.
How many mortgage loans offered in this census tract? (count unique HMDAlei)
How many small business loans offered in this census tract? (count unique SBA Borrower name <-- the version with the index prefix)
Percent mortgage loan originations that went to White non-Hispanic applicants <-- use the new Race_Ethnicity field created above
Percent mortgage loan originations that went to White Hispanic applicants
Percent mortgage loan originations that went to Black applicants
Percent mortgage loan originations that went to Asian applicants
Percent mortgage loan originations that went to Other Race / Ethnicity applicants
Table: County-level summary[IGNORE FOR NOW]
Data sources: FDIC Census, HMDA, SBA, joined by census tract ID
Fields:
Tasks:
tabular-summaries.ipynb
Race_Ethnicity
that usesapplicant_race_1
andapplicant_ethnicity_1
to map applicants to the following categories: (1) White non-Hispanic, (2) White Hispanic, (3) Black, (4) Asian, and (5) Other, where 3,4,5 are each determined by race only (ignoring ethnicity)Notes
Table: Branch-level details Data sources: FDIC locations Fields:
Table: Census tract-level summary Data sources: FFIEC Census, HMDA, SBA, joined by census tract ID Possible join logic:
Note: For HMDA race/ethnicity, for now, focus on
applicant_race_1
andapplicant_ethnicity_1
Fields:lei
)Borrower name
<-- the version with the index prefix)Race_Ethnicity
field created aboveTable: County-level summary [IGNORE FOR NOW] Data sources: FDIC Census, HMDA, SBA, joined by census tract ID Fields: