abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

EX 12.1.4 Identical MSE's for each fold #42

Closed legraudal closed 3 years ago

legraudal commented 3 years ago

Hi Jonas,

We're uncertain about how to go about the coding of this exercise. We end up having calculated the same MSE's for each fold. We suspect that it is because we fail to loop over all of the folds. Can you give a hint or point us in the right direction on how to solve this problem?

P.S we are aware, that the calculations are not the right ones as given by the exercise text, but since we can't figure out the loop we've disregarded this for now. :)

Best regards group 56

from sklearn.model_selection import KFold lambdas = np.logspace(-4, 4, 12)

outer loop: lambdas

mseCV = [] for lambda_ in lambdas:
kfolds = KFold(n_splits=5, shuffle=False) folds = list(kfolds.split(X_train, y_train))

inner loop: folds

mseCV_ = []    
for i in folds:        
    # train model and compute MSE on test fold
    pipe_lassoCV = make_pipeline(PolynomialFeatures(degree=3, include_bias=False),
                                 StandardScaler(),
                                 Lasso(alpha=lambda_, random_state=1))        
    pipe_lassoCV.fit(X_train, y_train)
    y_hat = pipe_lassoCV.predict(X_val)
    mseCV_.append(mse(y_hat, y_val))    
    avg_mse = sum(mseCV_)/len(mseCV_)
# store result    
mseCV.append(mseCV_) 

convert to DataFrame

lambdaCV = pd.DataFrame(mseCV, index=lambdas)

jsr-p commented 3 years ago

hi @legraudal , inside your inner loop you want to unpack the training and validation indices from the list folds defined in your assignment. Then you want to subset your data using the iloc and the unpacked train and validation indices from the list folds. After that you will have to fit your Lasso and predict on the relevant data :)