Efficient Bayesian ridge regression

flaxter commented 8 years ago

Using 100 KDE features and all the categorical variables, I end up with a dataset that's 840x6578 so I'm inclined to do ridge regression. I tried to implement it in Stan but it's taking forever to sample the 6578 parameters, I think because there's so much correlation among the covariate. One trick that speeds things up greatly is to do a QR decomposition in R (see https://github.com/stan-dev/rstanarm/issues/30) and then learn 840 parameters based on an orthogonal design matrix. This works great, but I'm not sure how to recover the original 6578 parameters. But in any case, this might all be moot, as even with 6578 parameters I don't really know how to read off something like effect sizes / percent of variance explained. So now I'm thinking I should just do good old fashioned forward stepwise regression. But even that is slow with Stan--i.e. at the moment I've got a matrix that 840x100 because I want to use the 100 KDE features for the age variable.

So should I forget Stan? Or do something else? Interestingly, the GP regression perspective on this is efficient in Stan: considering a linear kernel, the covariance K becomes 840x840 and you just sample the observations:

y ~ N(0, K + \sigma^2 I)

But again this doesn't give us a clear interpretation of which are the important variables. So maybe I should really just do group lasso? I guess there are some group lasso packages in R to try...

flaxter commented 8 years ago

update: found a bug with the KDE features, maybe this will work better after I fix that

flaxter commented 8 years ago

OK, it's better but still pretty damn slow. @yxwang1988 what do you think about implementing ORF?

djsutherland / pummeler

Efficient Bayesian ridge regression #14