fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
530 stars 42 forks source link

Does GPBoost support more than two levels? #108

Closed dirkpelt closed 10 months ago

dirkpelt commented 10 months ago

Dear Fabio,

not sure whether this is the right place to ask and sorry if this is reported elsewhere, but does GPBoost support modeling data with more than two levels (e.g., students nested within schools nested within regions)?

Kind regards, Dirk

fabsig commented 10 months ago

Thanks for your interest in GPBoost!

Yes, it does. You can have as many random effects as you like (crossed and/or nested). If you have more than two levels, you need to do the enumeration of the low level random effects with the helper function get_nested_categories twice. Below is demo code for how this can be done in R (see also the demos in R or Python for the use of get_nested_categories with only two levels):

region <- c(1,1,1,1,2,2,2,2)
school <- c(1,1,2,2,1,1,2,2)
student <- c(1,2,1,2,1,2,1,2)
school_nested <- get_nested_categories(region, school)
student_nested <- get_nested_categories(school_nested, student)
group_data <- cbind(region, school_nested, student_nested)
gp_model <- GPModel(group_data = group_data, y = y, X = X, likelihood = "gaussian")
## continue with GPBoost algorithm...

## or for a GLMM
gp_model <- fitGPModel(group_data = group_data, y = y, X = X, 
                       likelihood = likelihood, params = list(std_dev = TRUE))
> group_data 
     region school_nested student_nested
[1,]      1             1              1
[2,]      1             1              2
[3,]      1             2              3
[4,]      1             2              4
[5,]      2             3              5
[6,]      2             3              6
[7,]      2             4              7

Note that for multiple random effects, nelder_mead can sometimes be faster. If speed is an issue, I suggest you also try

gp_model <- GPModel(group_data = group_data, y = y, X = X, likelihood = "gaussian")
gp_model$set_optim_params(params = list(optimizer_cov = "nelder_mead"))
## continue with GPBoost algorithm...

## or for a GLMM
gp_model <- fitGPModel(group_data = group_data, y = y, X = X, 
                       likelihood = "gaussian", params = list(std_dev = TRUE,  optimizer_cov = "nelder_mead"))