dataset / miscelanious

0 stars 0 forks source link

Building Regression Model #8

Open aditbala99 opened 1 year ago

aditbala99 commented 1 year ago

Building Regression model

library(caTools) set.seed(123) split = sample.split(datan$median_house_value, SplitRatio = 0.9) training_set = subset(datan, split == TRUE) test_set = subset(datan, split == FALSE) m <- lm(formula = median_house_value ~ ., data = training_set) summary(m)

Call: lm(formula = median_house_value ~ ., data = training_set)

Residuals: Min 1Q Median 3Q Max -365334 -44799 -8876 33157 516625

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -17713.419 3007.209 -5.890 3.92e-09 housing_median_age 1403.913 46.780 30.011 < 2e-16 total_rooms -16.570 1.225 -13.529 < 2e-16 total_bedrooms 125.034 8.350 14.973 < 2e-16 population -64.708 1.588 -40.739 < 2e-16 households 157.760 9.630 16.382 < 2e-16 median_income 51145.454 470.785 108.639 < 2e-16 ocean_proximity.INLAND -63853.921 1348.467 -47.353 < 2e-16 ocean_proximity.ISLAND 171468.785 31044.200 5.523 3.37e-08 ocean_proximity.NEAR BAY -1717.301 1767.588 -0.972 0.331
ocean_proximity.NEAR OCEAN 11860.808 1620.804 7.318 2.62e-13

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69340 on 18673 degrees of freedom Multiple R-squared: 0.6453, Adjusted R-squared: 0.6451 F-statistic: 3397 on 10 and 18673 DF, p-value: < 2.2e-16

Predicting the Test set results

y_pred = predict(m, newdata = test_set) MSE <- mean((y_pred - test_set$median_house_value)^2) MSE [1] 4247739708 totalss <- sum((test_set$median_house_value - mean(test_set$median_house_value))^2) totalss [1] 2.08637e+13

Regression and Residual Sum of the Squered.

regss <- sum((y_pred - mean(test_set$median_house_value))^2) regss [1] 1.476199e+13 resiss <- sum((test_set$median_house_value - y_pred)^2) resiss [1] 8.308579e+12

Calulate R squared.

R2 <- regss/totalss R2 [1] 0.7075441