Open aabk-bkaa opened 4 years ago
hi @aabk-bkaa, assuming that you did not plot the data and label the curves incorrectly, there could be other reasons for the RMSE being lower on the validation data than on the training data. See: https://stats.stackexchange.com/questions/187335/validation-error-less-than-training-error
After fitting our model it appears that our validation curve is inverted:
The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.
The modelling was produced with the following code:
` X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)
lambdas = np.logspace(0, 8, 12)
folds = KFold(n_splits = 5) MSE_list =[]
for _lambda in tqdm(lambdas): pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(), Lasso(alpha = _lambda, max_iter = 1000)) MSE_train = [] MSE_list_intermediate = []
MSE = pd.DataFrame(MSE_list) MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]
MSE.to_excel("LASSO_output.xlsx") `
Can anybody help us.
Kind regards Anton and Søren