Create Analysis Notebook 3

Create example code and instructions to segment single-family products with the below filters:

single-family
first-lien
owner-occupied
conventional
home purchase

This code should be provided as a function that accepts:

extensions to the WHERE clause (for example: geography, action type, lender)
table name
database name
schema name
host name

These filters should accept single inputs, or list-like inputs.

This function should have an option that allows the user to write the query results to a pipe-delimited file with a .txt extension.

In the instructions inside the Jupyter notebook, discuss:

what these filters mean and how they affect the mortgage product
why a homogenous product is important to analysis
the presence of action type in the HMDA data and how that affects analysis

Produce the following outputs:

flat file with a pipe-delimiter and .TXT extension
Pandas dataframe (shown inline)
SQL script (located in the SQL folder)
analysis of a subset of HMDA data showing comparisons of product types in two different states over time. The comparison should use 2004-2017 data that was written to a file and reloaded. This analysis should account for action taken type and use Pandas to generate an aggregate measure of the data.
one or more example of visualizations of the data. For example originated loan amount averages for several MSAs from 2004-2017.

The goal of this example is to demonstrate how to get a dataset of a homogenous mortgage product, save the dataset to disk, load the data to Pandas, produce aggregate metrics, and graph them in a meaningful way.

cfpb / HMDA_Data_Science_Kit

Create Analysis Notebook 3 #40