DATA205- Capstone project
The primary objective of this project is to expand the existing collection of analysis examples within the Data Science Kit.
This project is in collaboration with the Capstone Project of the Montgomery Community College 2024 Fall intern program.
Course Summary: DATA 205
The DATA 205 course is designed to provide students with an understanding of fundamental concepts and hands-on experience as data scientists. In addition to the more conventional learning models, this course creates a multi-professional setting where students can learn in parallel with real-life practicing professionals, thus introducing them to practical approaches to using Big Data.
One of the main traits of DATA 205 is the project-based learning approach. Students are not just consumers of these details but rather stakeholders involved in data science decisions. As students perform various tasks with real-world data, they learn how to utilize and analyze data using many methods, determine the ethical use of data, and create reproducible research results.
Capstone Project Overview
- Work with a mentor and data to create a “data product”
- Special Projects
- Opportunity to work with an external organization
- Internal Projects
- Defined in collaboration with the instructor
Dataset requirements
- Must access data from an original data source
- Not from Kaggle or another aggregator unless the student can trace the data back to its primary source and access it there
- Must choose and use data in a manner consistent with any licensing or usage restrictions
- Must use real-world data (not synthetic data)
- Must work with at least 2 quantitative (numerical) and 2 qualitative (categorical) variables
- Must not be data the student worked with in a previous DATA course
- There may be exceptions; instructor approval is required
- Additional data requirements will be discussed as the student refine their project ideas
Project Expectations
- Choice of Topics & Datasets
- Elevator Pitch
- Data Ingestion, Cleaning, & Wrangling
- Exploratory Data Analysis
- Statistical Analysis
- Data Visualizations
- Data Product / Data Stories
- GitHub Presence!
Working with HMDA dataset
The Consumer Financial Protection Bureau's (CFPB) CFPB's "core mission is to ensure that when Americans apply for a mortgage, choose a credit card, or use any other financial product, the markets work for them."
The CFPB's mission is to protect consumers of financial products, including mortgages, in the same way it protects other consumers.
The Home Mortgage Disclosure Act (HMDA) of 1975 requires lenders to maintain and publicly report information about mortgage loans. This data can be beneficial in analyzing lending patterns and is instrumental in detecting discrimination.
- Monitoring for Fraud: The CFPB monitors lenders (banks, mortgage companies, etc.) that provide mortgages to ensure they are not committing fraud, such as providing false information or charging unreasonable fees.
- Improving Disclosure: The CFPB is helping consumers understand the contents of their mortgage contracts accurately by disclosing information about mortgages in an easy-to-understand manner.
- Support for borrowers experiencing difficulty repaying their loans: We provide appropriate advice and support to borrowers experiencing difficulty repaying their loans.
- Rules for mortgages: We formulate rules to regulate lenders' actions.
The reason why the CFPB is tackling the mortgage issue
The bursting of the housing bubble in the wake of the Lehman Shock in 2008 caused many consumers to be treated unfairly regarding their mortgages, leading to a serious problem in which many people lost their homes. In light of this situation, the CFPB aims to eradicate unfair and opaque practices in the mortgage market and create an environment in which consumers can use mortgages with peace of mind.
CFPB's Mortgage Initiatives and New Discoveries through DynamicData Analysis
The specific initiatives that the CFPB is taking regarding mortgages are extremely important from the perspective of consumer protection. By analyzing these initiatives using the DynamicData tool with data from 2018 to 2022, we may gain deeper insights.
Expected Results from DynamicData Analysis
By analyzing data from the four initiatives above using DynamicData, the following may become clear.
- Research Questions -
- Changes in fraud patterns:
a. Which types of fraud occur frequently, and how do these trends change year by year?
b. Which types of lenders are most frequently involved in fraud?
c. Are there any regions or attributes where fraud is concentrated?
-
Effects of improved disclosure:
a. Has consumers' understanding level improved due to improved information disclosure?
b. Has the number of complaints from consumers decreased?
c. Has the number of contracts based on incorrect information decreased?
-
Effectiveness of support for borrowers experiencing repayment difficulties:
a. What is the usage rate of support programs for borrowers experiencing repayment difficulties?
b. Is there anything that can be done to improve the effectiveness of these support programs?
c. Has the number of cases of debt restructuring and bankruptcy decreased?
-
Understanding trends through time-series analysis:
a. The relationship with trends in the overall mortgage market.
b. Economic fluctuations and policy changes impact fraud and consumer damage.
c. Changes in consumer awareness and behavior.
What can be expected from the analysis results?
- More effective consumer protection measures: identifying fraud hotspots and areas where consumers are particularly suffering
- Sound development of the industry as a whole: Curbing fraudulent activities and restoring consumer confidence can contribute to the sound development of the mortgage market as a whole.
- Contributing to policy making: Based on the analysis's results, it is possible to contribute to policy making related to mortgages.
Points to note when analyzing
- Data quality: To check the data's accuracy, comprehensiveness, and consistency.
- Analysis method: Select an appropriate method matching analysis objectives.
- Identifying causal relationships: Avoid confusing correlation and causation.
- External factors: To consider the impact of external factors such as economic fluctuations and policy changes.
Key Variables for Analysis
To analyze discrimination patterns in the DMV area, I want to focus on the following variables from HMDA data:
- Income: This can show whether or not there is discrimination by income when it comes to loaning money.
- Sex: This can assist in revealing the sex bias in lending.
- Ethnicity: This can be used to identify racial discrimination in the provision of mortgages.
- Places - Geographic unit: Comparing the lending data for various areas in the DMV region - Likely census tract/ county code
- Age: This can go a long way in explaining whether or not age-biased mortgage lending exists.
- Lender(Financial institution): Examining data by lender shows differences in the behaviors of different lender companies.
Summary
Analyzing the CFPB's initiatives regarding mortgages using DynamicData is an important step towards strengthening consumer protection and contributing to the sound development of the mortgage market. Based on the analysis's results, it is hoped that more effective consumer protection measures will be formulated and the transparency of the mortgage market will be increased.