drrben / project_regression

0 stars 0 forks source link

Question 9 #1

Open drrben opened 3 years ago

drrben commented 3 years ago
ajaypsrivatsa commented 3 years ago

I'm trying to explain the process, how do you get rid of Length or Visc_wt ?

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.135e-15  1.205e-02   0.000   1.0000    
Length       1.638e-01  7.624e-02   2.148   0.0318 *  
Diameter     4.019e-01  7.790e-02   5.160 2.64e-07 ***
Height       3.573e-01  3.094e-02  11.550  < 2e-16 ***
Shuck_wt    -7.058e-01  3.613e-02 -19.538  < 2e-16 ***
Visc_wt     -7.955e-02  3.965e-02  -2.006   0.0449 *  
Shell_wt     5.293e-01  3.476e-02  15.229  < 2e-16 ***

Diameter and length are basically the same thing. So p-values are not accurate in this case. Also, I am bit concerned that even if the adjusted R^2 is increasing, we are not seeing significance in some higher order terms. Maybe we should fit all models on the test set? We may be overfitting.

lorenzopepe999 commented 3 years ago

I think are not exacty the same thing, I mean we don't really know the difference between them in qualitative terms, we just know that they are correlated, however, even if the correlation decrease the significance of the regressors when you put them together, the significance is still respected at 0.05 and the R increases so I think we should keep it. I don't think we are overfitting and anyway i don't think we have to asses it now since the part 3 is about model selection. I think now we just have to think about postulates, R adjusted and representativeness of the model

dorukyasa commented 3 years ago

I think are not exacty the same thing, I mean we don't really know the difference between them in qualitative terms, we just know that they are correlated, however, even if the correlation decrease the significance of the regressors when you put them together, the significance is still respected at 0.05 and the R increases so I think we should keep it. I don't think we are overfitting and anyway i don't think we have to asses it now since the part 3 is about model selection. I think now we just have to think about postulates, R adjusted and representativeness of the model

I agree