i want an app that illustrates overfitting as we increase flexilibility of the fitting function.
The app should look like this here, but instaed of blue and green functions there should be the user-selected degree of freedom function. a slider selection df will make the function more or less wiggly.
keep the brown lm curve
keep black truth curve
could precompute all values for all df from 2:60, say
would allow to draw a curve in the right plot that is traced out and we display the current degree of flexbility with the colored, square, marker.
a key feature of this setup is that the right plot shows a U -shape for the test error : first it decreases but then it increases again.
it's ugly and way too long so i started refactoring this into smaller pieces:
getmodels <- function(x,y,newx,dfs = 2:20){
r = data.frame(x=x,y=y)
o = data.frame(x=newx)
s = list()
# browser()
for (i in 1:length(dfs)){
if (dfs[i] == 2){
s[[i]] <- lm(y~x,r)
o = cbind(o, predict(s[[i]], newdata = o))
} else {
s[[i]] <- smooth.spline(x,y,df = dfs[i])
o = cbind(o, predict(s[[i]], o$x)$y)
}
}
names(o)[-c(1)] <- paste0("df",dfs)
names(s) <- paste0("df",dfs)
list(models = s, pred = o)
}
datafig2.12 <- function(fun = function(x) {x*sin(x-2) + 0.2*x},n=90,eps = 1,df1=5, df2=40, ub = 5,nnew = 200){
set.seed(1234)
r = data.frame(x = seq(0,ub,length.out = n))
r$truth = fun(r$x)
r$epsi = rnorm(n,mean = 0, sd = eps)
r$y = r$truth + r$epsi
# browser()
mods = getmodels(r$x,r$y,seq(0,ub, length.out = nnew))
# add test data to predictions
mods$pred$truth = fun(mods$pred$x)
mods$pred$testdata = mods$pred$truth + rnorm(nnew,mean = 0, sd = eps)
# mses and bias
mses = list(
train = colMeans(sapply(mods$models,residuals)^2)
) # test mses
mses$test <- colMeans((mods$pred[,names(mods$models)] - mods$pred[,"testdata"])^2)
# bias
mses$bias <- colMeans((mods$pred[,names(mods$models)] - mods$pred[,"truth"])^2)
mses$var <- diag(var(mods$pred[,names(mods$models)]))
list(mods,mses)
}
x = datafig2.12()
plotfig2.12 <- function(d) {
stopifnot(is.list(d))
m = data.frame(d)
m$x = 2:(nrow(m)+1)
m = reshape2::melt(m,id.vars = "x")
m %>%
rename(model = variable) %>%
ggplot(aes(x=x,y = value, color = model)) + geom_point()
}
plot(plotfig2.12(x[[2]]))
task
refactor the smse1 function to use getmodels from above.
change smse1 so that it returns the curve for the currently selected degree of freedom only (i.e. the green curve, but only for what you choose in terms of df)
i want an app that illustrates overfitting as we increase flexilibility of the fitting function.
The app should look like this here, but instaed of blue and green functions there should be the user-selected degree of freedom function. a slider selection df will make the function more or less wiggly.
code
here is the function that makes the plot
it's ugly and way too long so i started refactoring this into smaller pieces:
task
smse1
function to usegetmodels
from above.smse1
so that it returns the curve for the currently selected degree of freedom only (i.e. the green curve, but only for what you choose in terms ofdf
)