Open emilio-berti opened 4 years ago
Hallo!
Double of the people that usually reply joined for this time - thanks Erik :) I am using an older version of glmnet (2.0-18) for dependencies issues. Here's the result:
Forgot to load the dataset? Error in terms.formula(object, data = data) : object 'd' not found
Results for lasso:
12 x 1 sparse Matrix of class "
(Intercept) 15.502342425
fixed.acidity .
volatile.acidity -1.786555255
citric.acid .
residual.sugar 0.024798381
chlorides -0.189039780
free.sulfur.dioxide 0.003043232
total.sulfur.dioxide .
density -13.448154283
pH 0.061046933
sulphates 0.292670997
alcohol 0.345966111
No results for step-AIC (?)
Step-AIC and LASSO regression give similar results in this case. However, LASSO includes in the best model also a very small, but significant effect of citric acid and chlorides. In general, we like our vinho verde sweet and strong and quite smooth (low acidity and density). I agree ;)
What I learnt:
What I am still missing:
Background
I was in Belfast at BES and was talking with some people about variable selection. When I said I was selecting them using a step-wise AIC(c) approach, a guy (A) looked at me in shock and horror. Apparently, I was doing it all wrong. In summary, step-wise selection introduces some biases that give, at the end, unfair results. A then told me that the new method to be used to not introduce such biases is the LASSO regression.
Challenge
We want to understand which factors determine the quality (quality) of the vinho verde from white grapes. The data to investigate this is archived at https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv. A description of the dataset is available at https://archive.ics.uci.edu/ml/datasets/Wine+Quality.
Instructions
What we want to model