Closed Pascal-Schmidt closed 2 years ago
Hey Pascal, no problem. It's interesting to see somebody other than me actually testing it out! 😁
You are correct that it doesn't use the same regularization parameter across all bootstrap replicates but, as you noted, tunes it using k-fold CV on every bootstrap replicate. While this might not exactly line up with the paper's algorithm description (honestly can't quite remember), my understanding is that in practice this leads to better convergence/performance.
Based on the code snippet you provided, I'm pretty sure you're not doing anything wrong per se. If I could take a guess as to why you're seeing different performance on your own implementation, I would say it's most likely because you're creating/tuning your own lambda regularization parameter. At least I'm pretty sure this is what's happening here (tbh I don't use tidymodels
so I'm doing a bit of guessing here):
lasso_grid <- tune::tune_grid(
wf_lasso,
metrics = metrics,
resamples = folds,
grid = lambda_grid
)
lowest_rmse <- lasso_grid %>%
tune::select_best("rmse", maximize = FALSE)
The glmnet
documentation says the following regarding manually tuning the lambda regularization parameter:
Optional user-supplied lambda sequence; default is NULL, and glmnet chooses its own sequence. Note that this is done for the full model (master sequence), and separately for each fold. The fits are then alligned using the master sequence (see the allignment argument for additional details). Adapting lambda for each fold leads to better convergence. When lambda is supplied, the same sequence is used everywhere, but in some GLMs can lead to convergence issues
In short, glmnet
suggests using their automatically generated lambda sequences and I just follow that convention within bolasso
. I'm guessing this is why we're seeing somewhat different results!
bolasso
is doing under the hoodThe following code shows pretty much exactly what the package is actually doing. The package just adds a bunch more junk so the different Lasso implementations play nice together 😆
data(PimaIndiansDiabetes, package = "mlbench")
### Results from the Bolasso package #######################################
library(bolasso)
#> Loading required package: Matrix
set.seed(123)
model <- bolasso(
diabetes ~ .,
data = PimaIndiansDiabetes,
n.boot = 100,
implement = "glmnet",
family = "binomial"
)
#> Loaded glmnet 4.1-4
selected_vars(model, threshold = 0.98)
#> # A tibble: 5 × 2
#> variable mean_coef
#> <chr> <dbl>
#> 1 Intercept -8.15
#> 2 pregnant 0.119
#> 3 glucose 0.0348
#> 4 mass 0.0821
#> 5 pedigree 0.849
### What it's doing under the hood #########################################
library(dplyr)
library(tidyr)
set.seed(123)
bootstraps <- lapply(
1:100,
function(i) {
idx <- sort(sample(nrow(PimaIndiansDiabetes), replace = TRUE))
PimaIndiansDiabetes[idx, ]
}
)
coefs <- lapply(
bootstraps,
function(d) {
X <- model.matrix(diabetes ~ . - 1, data = d)
y <- d$diabetes
drop(coef(cv.glmnet(x = X, y = y, family = "binomial"), s = "lambda.min"))
}
)
# The reason the coefficients aren't 100% identical is randomness from
# glmnet selecting the k-folds.
coefs |>
bind_rows() |>
summarise(
across(
.fns = list("prop" = ~ (\(i) sum(i == 0)/100)(.x), "mean" = ~ mean(.x))
)
) |>
pivot_longer(
cols = everything(),
names_to = c("var", "meas"),
names_sep = "_"
) |>
group_by(var) |>
mutate(drop = value[meas == "prop"] < .02) |>
ungroup() |>
filter(drop, meas == "mean") |>
select(var, value)
#> # A tibble: 5 × 2
#> var value
#> <chr> <dbl>
#> 1 (Intercept) -8.17
#> 2 pregnant 0.119
#> 3 glucose 0.0350
#> 4 mass 0.0823
#> 5 pedigree 0.854
Hopefully this all is helpful!
Thanks for the response :)
I guess the difference is the lambda sequence then. I was using a grid_latin_hypercube
but willl try to implement the algoorithm with the glmnet package an their lambda sequence.
Thanks again for the package and putting it on CRAN!
Thanks for the package! I have a question about the algorithm.
In the paper the algorithm states that regularization parameter mu is set before the for loop and so every bootstrap sample is using the same one I think. In the algorithm you implemented, do you tune the penalty term for every bootstrap sample with k-fold cross validation?
I also tried the
bootLassoOLS
function from the HDCI package but get slightly different results. Your algorithm gives the best variable selection when fitting a OLS regression after though.I also tried to implement the algorithm myself but also get different results so I am going wrong somewhere. Maybe I am getting the decay wrong. I am creating 128 bootsrtap samples and then find the best penalty term for each sample with 10 fold cross-validation and then refit the model with the best penalty term on the entire bootstrap sample again and record all the coefficients that are non zero.
Here is a code snippet of how it looks:
So my question would just be if you could just in 4-5 bullet points explain how you implemented the algorithm thta would be great so I can understand what was wrong with my code better. Thank you.