asadoughi / stat-learning

Notes and exercise attempts for "An Introduction to Statistical Learning"
http://asadoughi.github.io/stat-learning
2.13k stars 1.62k forks source link

7.5b) #94

Open nickblink opened 6 years ago

nickblink commented 6 years ago

solutions say g1 is expected to have smaller test RSS because of one less degree of freedom but this is not necessarily true. For example, if the true DGP is cubic, then g2 will fit better.

Jeffalltogether commented 4 years ago

I agree with @nickblink.

When lambda is maximized, the penalty term is maximized, forcing the final form of g to have an nth-degree derivative = 0 (else the RSS = infinity).

For g1 that is a 1st degree polynomial and for g2 that is a 2nd degree polynomial.

The best fit to the test data will occur when the degree of polynomial that represents the true distribution is used, which could be 1st or 2nd degree depending on the underlying distribution of the data.

leowang396 commented 4 years ago

Agreed.

The difference between g1 and g2 is the degree of their polynomial, but overfitting does not always occur. The function that wins the bias-variance trade-off in the context of given data set will give the best test RSS.