cfpb / HMDA_Data_Science_Kit

Creative Commons Zero v1.0 Universal
41 stars 25 forks source link

Table of Contents changing something

Repository Purpose and Scope:

The primary goal of this repository is to provide data users with tools to enable them to produce accurate analytics results. Additionally, this repository provides an overview of HMDA resources, publications, and guidelines for proper use. This repository does not provide statutory interpretation or compliance assistance.

What Is HMDA?

HMDA refers to the Home Mortgage Disclosure Act of 1975. HMDA requires many financial institutions to maintain, report, and publicly disclose loan-level information about mortgages. HMDA was originally enacted by Congress in 1975 and is implemented by Regulation C.

Congress amended HMDA in 2010 and the Bureau finalized a rule implementing changes to HMDA in 2015. Most of the rule's provisions affect data collected in 2018 and reported in 2019. However, beginning with data collected in 2017, depository institutions that originated fewer than 25 covered closed-end mortgages in either of the preceding two years are not required to report.

The senate bill S2155 modified some reporting requirements for the 2018 data collection. These changes will be outlined in upcoming publications.

What is the purpose of HMDA?

HMDA Datasets

Three raw data files are published annually under HMDA authority. File formats (and schemas) vary by data source. The National Archives (NARA) use a .DAT format, the FFIEC site maintained by the Federal Reserve Board (FRB) use a .CSV format (with Census data appended) and the FFIEC site maintained by the CFPB use a pipe-delimited .TXT format (with Census data appended). Links to HMDA datasets are available in this file.

These datasets include:

Additional elements discussing of changes to underlying data (such as the benchmark for the rate spread variable) will be added in the future. Check the Platform FAQ Pages for common tips.

HMDA Data Browser:

The HMDA Data Browser provides filtering and download capability for the LAR datasets.

Integration of Census Data with HMDA

HMDA data is often joined to Census data on the county FIPS code to show context for the mortgage data at the geographic level. The FFIEC joins the following to the HMDA LAR data: area population, minority population percentage, FFIEC median family income, tract to MSA/MD median family income percentage, number of owner-occupied units, and the number of 1-4 family units. These data are joined at the tract level to provide context for mortgage activity in the relevant geography. The base data for this join are made available by the FFIEC on this website. The year of the Census data correspond to the HMDA collection year.

The HMDA-Census repository contains code that can be used to download FFIEC Census flat files and OMB MSA Delineation files and create joined cuts of those datasets for use with HMDA.

HMDA Data Documentation

FIGs

Compliance Guides

Regulatiory Implementation Resources

Transactional Coverage Charts

Institutional Coverage Charts

Reportable Data

Other

Working With HMDA Data

The HMDA data are complex and care must be taken to ensure that analytics results are accurate.

HMDA Publications

For a list of HMDA publications, see here

Basic Requirements and Instructions

Requirements

Requirements

The resources in this repository assume that a database has been installed and is functioning properly. The SQL code is written for PostgreSQL, other SQL versions may require modification to the code.

The Python resources assume that a functioning installation of Python 3.5 or greater is present. Convention in these instructions and code resources will use python3 to invoke python scripts. If two versions of Python are not present, this command may need to be changed to python, without the 3.

This repository has a requirements.txt file that can be used to install the Python libraries used in the repository:

Downloading and Unzipping Data

To begin using the HMDA data you will first need to download the data. A list of data resources is available in HMDA data links.

These data can be downloaded manually from the links listed or the following script can be run from the HMDA_Data_Science_Kit directory:

Fair warning, these are very large files (each LAR zip file is ~500mb) and will take a substantial amount of time to download.

The script can download HMDA ultimate data files for LAR, Transmittal Sheet, and Panel for the years 2004 through 2022.

Running the script without flags will download all LAR, Transmittal Sheet, and Panel files that are not present.

The script accepts the following option flags:

Usage examples

Download Troubleshooting

Sometimes files from the National Archives fail to download correctly. An indicator that this happens is the presence of a file with the correct name (such as LAR_20013.zip) that has a filesize of 4kb. In these cases the file must be deleted and redownloaded. One way to do this is:

A note on 2015-2016 data: the FFIEC website where these files are held cannot be scraped using the utility in this codebase. We recommend that you manually download the panel, TS, and LAR files for those years from the links in the code. All code calling 2015-2016 data will simply be skipped if the files are not present.

Unzipping Compressed Files

All of the LAR files, and several of the Panel and Transmittal Sheets download as zip files. Prior to loading these data the files must be unzipped. To do so, run the following script:

The above script will unzip all the zipped files and standardize the names of the files.

Alternatively, the LAR, Panel, and Transmittal Sheet files can be unzipped as groups using the following commands:

Creating Postgres Tables and Loading Data

The default installation of Postgres should create both a Postgres role (superuser account) and a Postgres database. The default behavior of the load scripts uses these for login. If the role or the database are not present then a user and/or database will need to be specified when running the load scripts. Examples are provided later in this section. Fair warning that creating the postgres tables will take a very long time.

Available option flags for the load scripts are as follows:

The script below creates a HMDA database on an existing Postgres installation, creates the hmda_public schema, creates tables, and loads data:

To load subsets of the HMDA data (LAR, Transmittal Sheet, or Panel) use the scripts below. These scripts will create a database named 'hmda' if one does not exist. They will also create a hmda_public schema in which all the data tables will be created and populated.

Using Options Flags

All of the load scripts support the same option flags. The example below use the create_hmda_db.sh script, but any script can be substituted.

To specify a username:

To specify a password:

To specify a database:

To specify a username and password:

The SQL scripts provided in HMDA_Data_Science_Kit/load_scripts/SQL require an update to the path for the data sources before they can be used. The placeholder is {data_path}. This placeholder is replaced with the full path to the HMDA data when any of the load scripts are run. For example {data_path}HMDA_Data_Science_Kit/data/lar/lar_ult_2004.dat' on a Mac will become /Users//HMDA_Data_Science_Kit/data/lar/lar_ult_2004.dat'.

This change can be undone by running the following:

Quickstart

To download all supported HMDA data, unzip any zipped data, and add the data to a Postgres database, you'll run the following commands in order:

bash download_scripts/download_hmda.sh
bash download_scripts/unzip_all.sh
bash load_scripts/create_and_load_hmda.sh

After the downloading step, you'll need to check if all files successfully downloaded. See above for more information.

Contributing

In the open to maximize transparency and encourage third party contributions. If you want to contribute, please read and abide by the terms of the License for this project.

We use GitHub issues in this repository to track features, bugs, and enhancements to the software. Pull Requests are welcome

Open source licensing info

  1. TERMS
  2. LICENSE
  3. CFPB Source Code Policy