Open JiaqiLiu opened 2 years ago
https://github.com/Gurobi/modeling-examples/blob/1abb8700611e45bb34a760eebe2f6dcd1ff85875/linear_regression/l0_regression.html#L13183
This paragraph mentioned that it is not advisable to use RSS as the performance metric, but MSE via cross-validation.
I think the highlight on MSE over RSS is misleading. Note that, given estimate $\hat\beta$,
$$ \mathrm{RSS} = (y-X\hat\beta)^T(y-X\hat\beta) = \sum (y_i - \hat{y}_i)^2 = n \cdot \mathrm{MSE} $$
So, both the training MSE and RSS decrease monotonically as more features are considered, not only RSS.
The cross-validation part should be the correct. That is, we use grid search to find best $s$.
https://github.com/Gurobi/modeling-examples/blob/1abb8700611e45bb34a760eebe2f6dcd1ff85875/linear_regression/l0_regression.html#L13183
RSS vs MSE
This paragraph mentioned that it is not advisable to use RSS as the performance metric, but MSE via cross-validation.
I think the highlight on MSE over RSS is misleading. Note that, given estimate $\hat\beta$,
$$ \mathrm{RSS} = (y-X\hat\beta)^T(y-X\hat\beta) = \sum (y_i - \hat{y}_i)^2 = n \cdot \mathrm{MSE} $$
So, both the training MSE and RSS decrease monotonically as more features are considered, not only RSS.
Cross-validation
The cross-validation part should be the correct. That is, we use grid search to find best $s$.