Question 9 - Githubissues

drrben commented 3 years ago

[x] Look at the scatterplot of the data (GGally:ggpairs).
[x] Look for correlations betwwen predictors.
[x] Select some additional variables to add to the simple linear model of Part II in order to better predict number of rings. Justify your choices (keep in mind that we want a practical method to predict number of rings).
[x] Perform a multiple linear regression.
[x] Check the validity of the model. If validity conditions are not met, transform some variables, add/delete some variables and recheck until you find an acceptable model.

ajaypsrivatsa commented 3 years ago

I'm trying to explain the process, how do you get rid of Length or Visc_wt ?

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.135e-15  1.205e-02   0.000   1.0000    
Length       1.638e-01  7.624e-02   2.148   0.0318 *  
Diameter     4.019e-01  7.790e-02   5.160 2.64e-07 ***
Height       3.573e-01  3.094e-02  11.550  < 2e-16 ***
Shuck_wt    -7.058e-01  3.613e-02 -19.538  < 2e-16 ***
Visc_wt     -7.955e-02  3.965e-02  -2.006   0.0449 *  
Shell_wt     5.293e-01  3.476e-02  15.229  < 2e-16 ***

Diameter and length are basically the same thing. So p-values are not accurate in this case. Also, I am bit concerned that even if the adjusted R^2 is increasing, we are not seeing significance in some higher order terms. Maybe we should fit all models on the test set? We may be overfitting.

lorenzopepe999 commented 3 years ago

I think are not exacty the same thing, I mean we don't really know the difference between them in qualitative terms, we just know that they are correlated, however, even if the correlation decrease the significance of the regressors when you put them together, the significance is still respected at 0.05 and the R increases so I think we should keep it. I don't think we are overfitting and anyway i don't think we have to asses it now since the part 3 is about model selection. I think now we just have to think about postulates, R adjusted and representativeness of the model

dorukyasa commented 3 years ago

I think are not exacty the same thing, I mean we don't really know the difference between them in qualitative terms, we just know that they are correlated, however, even if the correlation decrease the significance of the regressors when you put them together, the significance is still respected at 0.05 and the R increases so I think we should keep it. I don't think we are overfitting and anyway i don't think we have to asses it now since the part 3 is about model selection. I think now we just have to think about postulates, R adjusted and representativeness of the model

I agree

drrben / project_regression

Question 9 #1