Vasi012 / PP5-Predictive-Analysis

Milestone Project for Predictive Analytics Specialisation at Code Institute: Predicting House Pricing
1 stars 2 forks source link
data-science deep-learning machine-learning python scikit-learn

For Sale

code institute logo

House Pricing (PP5)

This project has been made as part of the 5 milestone projects within the Full Stack Developer course provided by Code Institute. This project will be the last one within this course, and represents the Predictive Analytics path that i have chosen. Therefore, the initial idea for this project was 'working with data'.

In this project you will be taken step by step withing everything that is happening from data cleaning to feature engineering, as the content has been personalised in a specific way, making you feeling welcomed, and helping by offering a great understanding of each individual step and what I did and how I did.

At any point, if you get confused, please refer back to the readme file as you will find a lot of important information that has been used within the project.

Mockup

The live application can be found here.

Dataset Content

What is Kaggle?

Within this project, I have created a fictional user story. However, the predictive analytics done, could be applied in a real project within the workplace or if you live in Ames.

This document contains 1.460 (1.461 including the name of each variable) rows and represents housing records from a city called Ames located within the region of Iowa, US.

For the house profile provided in this dataset, I have created the table located below which has been built up with the variables provided by the dataset, and each individual meaning and units use to measure these variables.

In any part of the project that you are on, and you don’t understand one of the categories that the analysis has been done, please refer to the below table.


Variable Meaning Units
1stFlrSF First Floor square feet 334 - 4692 - (Min - Max > Sq. ft.)
2ndFlrSF Second floor square feet 0 - 2065 - (Min - Max > Sq. ft.)
BedroomAbvGr Bedrooms above grade (does NOT include basement bedrooms) 0 - 8 - (Min - Max > Bedrooms)
BsmtExposure Refers to walkout or garden level walls Gd: Good Exposure; Av: Average Exposure; Mn: Minimum Exposure; No: No Exposure; None: No Basement
BsmtFinType1 Rating of basement finished area GLQ: Good Living Quarters; ALQ: Average Living Quarters; BLQ: Below Average Living Quarters; Rec: Average Rec Room; LwQ: Low Quality; Unf: Unfinished; None: No Basement
BsmtFinSF1 Type 1 finished square feet 0 - 5644 - (Min - Max > Sq. ft.)
BsmtUnfSF Unfinished square feet of basement area 0 - 2336 - (Min - Max > Sq. ft.)
TotalBsmtSF Total square feet of basement area 0 - 6110 - (Min - Max > Sq. ft.)
GarageArea Size of garage in square feet 0 - 1418 - (Min - Max > Sq. ft.)
GarageFinish Interior finish of the garage Fin: Finished; RFn: Rough Finished; Unf: Unfinished; None: No Garage
GarageYrBlt Year garage was built 1900 - 2010 - (Min - Max > Year Built)
GrLivArea Above grade (ground) living area square feet 334 - 5642 - (Min - Max > Sq. ft.)
KitchenQual Kitchen quality Ex: Excellent; Gd: Good; TA: Typical/Average; Fa: Fair; Po: Poor
LotArea Lot size in square feet 1300 - 215245 - (Min - Max > Sq. ft.)
LotFrontage Linear feet of street connected to property 21 - 313 - (Min - Max > Lin. ft.)
MasVnrArea Masonry veneer area in square feet 0 - 1600 - (Min - Max > Sq. ft.)
EnclosedPorch Enclosed porch area in square feet 0 - 286 - (Min - Max > Sq. ft.)
OpenPorchSF Open porch area in square feet 0 - 547 - (Min - Max > Sq. ft.)
OverallCond Rates the overall condition of the house 10: Very Excellent; 9: Excellent; 8: Very Good; 7: Good; 6: Above Average; 5: Average; 4: Below Average; 3: Fair; 2: Poor; 1: Very Poor
OverallQual Rates the overall material and finish of the house 10: Very Excellent; 9: Excellent; 8: Very Good; 7: Good; 6: Above Average; 5: Average; 4: Below Average; 3: Fair; 2: Poor; 1: Very Poor
WoodDeckSF Wood deck area in square feet 0 - 736 - (Min - Max > Sq. ft.)
YearBuilt Original construction date 1872 - 2010 - (Min - Max > Year Built)
YearRemodAdd Remodel date (same as construction date if no remodeling or additions) 1950 - 2010 - (Min - Max > Remodel Year)
SalePrice Sale Price 34.900 - 755.000 - (Min - Max > Sale price in $)

Agile methodology - Development

table


Crisp-DM, what it is? how it's used?

Crisp-dm

  1. Business Understanding - What does the business need?
  2. Data Understanding - What data do we have / need? Is it clean?
    • Don't forget, garbage in, garbage out, so make sure your data is cleaned.
  3. Data Preparation - How do we organize the data for modeling?
  4. Modeling - What modeling techniques should we apply?
  5. Evaluation - Which model best meets the business's objectives?
  6. Deployment - How do stakeholders access the results?

If you will like to reserch more indept what each of the above sequential phases mean and how to use them, please refer to CRISP-DM.


Business Requirements.

I am studying a Full Stack Developer course with Code Institute. I just learnt how to use Machine Learning as part of my last project and how to predict future trends which I will be using in my career as a Data Scientist.

My niece who lives in America started a small real estate business, and part of her vision was to buy 6 houses in a small town from Iowa, called Ames. As Ames is known for its robust, stable economy and flourishing cultural environment with a population over 89,540 people, my niece believes that this will be a very good investment. Buying some old houses, refurbish them and after that selling them at a higher price.

Overall, my niece has a good understanding of the average prices for the houses in this region. However, because this investment will be very important for her business, she reached out to her uncle who will be able to use the power of machine learning to predict the prices for these homes, without risking an inaccurate appraisal.

My niece has conducted research, and she found a public dataset for the houses that have been sold over the years. She will be able to share this data with me in order to create an accurate prediction for each of the houses that she plans to sell after refurbishment.


Hypothesis and how to validate?

  1. Hypothesis one.
  1. Hypothesis Two.
  1. Hypothesis Three.
  1. Hypothesis Four.

Rationale to map the business requirements to the Data Visualizations and ML tasks

List the business requirements and rationale to map them to the data visualizations and ML tasks.


Business requirement 1 – Data Visualization and Correlation study.

Outcome after predicting pricing.

FeaturesImactingPrice

GrLivArea

GarageArea

YearBuilt

1stFlrSF

OverallQual

TotalBsmtSF


Business requirement 2

Refer to the Scikit-learn lesson, Unit notebook 6: Cross-Validation Search Part 2.

At the end of the notebook, is a list of hyperparameter options and values to start with for the family of algorithms covered in the course.

Outcome after price prediction.

6predictions

ownimputs


ML Business Case

Business Case Assessment

  1. What are the business requirements?

Business requirement 1:

  1. Can the above business requirements be answered with conventional data analysis?
  1. Does my niece require a dashboard or an API endpoint?
  1. What would make my niece consider this project a successful outcome?
  1. As a data analyst I would like to be able to break down the project into epics and user stories.
  1. Ethical or Privacy concerns?
  1. Does the data suggest a particular model?
  1. What are the model inputs and intended outputs?
  1. What is the criteria for the performance goal of the predictions?
  1. How would my niece benefit from this?

Outcome after price prediction:

r2score

testandtrain


Dashboard Design

Dashboard Expectations

The dashboard should contain:

Page 1: Quick project summary

Quick project summary:

Page 2: House Sale Price Study

Page 3: Price Predictor

Page 4: Project Hypothesis and Validation

  1. An evaluation of sales price of other houses from this area are based on similar attributes with the 6 houses that my niece would like to sell. Therefore, this project should provide an accurate prediction of sales price for each house.

  2. The correlation analysis shows that the sizes of the ground floor living area, the first floor, the basement, and the garage, play a key role in determining the house price. In addition, the year of the house when was built and the last refurbishment, the quality of the used materials also plays a significant role in determining a house price.

Page 5: ML: House Sale Price Prediction


Unfixed Bugs

Test conducted on the Python Code:

How to use flake8?

test

Deployment

The master branch of this repository has been used for the deployed version of this application.

Using Github & Gitpod

To deploy my Data application, I had to use the Code Institute Full Template.

Forking the GitHub Repository

By forking the GitHub Repository you will be able to make a copy of the original repository on your own GitHub account allowing you to view and/or make changes without affecting the original repository by using the following steps:

  1. Log in to GitHub and locate the GitHub Repository
  2. At the top of the Repository (not top of page) just above the "Settings" button on the menu, locate the "Fork" button.
  3. You should now have a copy of the original repository in your GitHub account.

Making a Local Clone

  1. Log in to GitHub and locate the GitHub Repository
  2. Under the repository name, click "Clone or download".
  3. To clone the repository using HTTPS, under "Clone with HTTPS", copy the link.
  4. Open command line interface on your computer
  5. Change the current working directory to the location where you want the cloned directory to be made.
  6. Type git clone, and then paste the URL you copied in Step 3.

$ git clone ADD Project link

  1. Press Enter. Your local clone will be created.

Deployment To Heroku


Conclusion

Overall, the project is a success as all the requirements have been met and my niece is happy with the predicted price. However, due to the inflation, after a little bit of checking we realized that we have to add 10.5% to the house prices, but this might vary depending on how the inflation will grow or decrease.


Credits & Content used.

Acknowledgements

Thank you Code Institute for this awesome course. I have really enjoyed this experience and the way my mindset was changed from believing that coding is just a gibberish language to actually understanding it and working with it.