Vasi012 / PP5-Predictive-Analysis

Milestone Project for Predictive Analytics Specialisation at Code Institute: Predicting House Pricing
1 stars 2 forks source link

Feature Engineering #21

Closed Vasi012 closed 1 year ago

Vasi012 commented 1 year ago

Conclusion Feature Engineering Transformers:

The ordinal categorical encoding is: BsmtExposure, BsmtFinType1, GarageFinish and KitchenQual. The numerical transformation is: Log e, Log 10, Box Cox and YeoJohnson. Those transformations are considered for 1stFlrSF, LotArea and SalePrice. As previously mentioned the SalePrice has been dropped as our task will be to predict it.

Power will be considerate for GarageArea and MasVnrArea to be transformed in numerical type.

For GrLivArea we will consider Log e, Log 10, Power, Box Cox and Yeo Johnson as a numerical type.

For OpenPorchSF we will consider Yeo Johnson as a numerical type.

The following transformers will be used:

("NumericLogTransform",vt.LogTransformer(variables=['1stFlrSF', 'GrLivArea', 'LotArea'])), ("NumericPowerTransform",vt.PowerTransformer(variables=['GarageArea', 'MasVnrArea'])), ("NumericYeoJohnsonTransform",vt.YeoJohnsonTransformer(variables=['OpenPorchSF'])). As seen in the sale_price_study the strongest variables correlated are:

1stFlrSF, GarageArea, GrLivArea, OverallQual, TotalBsmtSF, YearBuilt. Smart Correlation

We have dropped the following features:

2ndFlrSF, GarageYrBlt, OveralQual, TotalBsmtSF. After combinations of correlation methods and selection methods we came to the following conclusion.

Spearman:

Cardinality: 1stFlrSF, GrLivArea, GarageArea and YearBuilt. We Drop: 2ndFlrSF, GarageYrBlt, OverallQual and TotalBsmtSF. Variance: TotalBsmtSF, 2ndFlrSF, GarageYrBlt and YearBuilt. We Drop: 1stFlrSF, GarageArea, GrLivArea and OverallQual. Pearson:

Cardinality: 1stFlrSF, GrLivArea and GarageArea. We drop: 2ndFlrSF, GarageYrBlt and TotalBsmtSF. Variance: TotalBsmtSF, 2ndFlrSF and GarageYrBlt. We drop: 1stFlrSF, GarageArea and GrLivArea.