boost-R / mboost

Boosting algorithms for fitting generalized linear, additive and interaction models to potentially high-dimensional data. The current relase version can be found on CRAN (http://cran.r-project.org/package=mboost).
73 stars 27 forks source link

Feature request: allow new factor levels in brandom #115

Open kmorndahl opened 2 years ago

kmorndahl commented 2 years ago

I am building a gamboost model with the GammaReg() family. The model fits fine, but when it comes time to predict on a new dataset I am getting an error: Error in X %*% rowSums(cf) : Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90.

I attempted to create a smaller reproducible example of this, but could not replicate that exact error. Instead, this smaller dataset gives a different error: Error in f(init, x[[i]]) : non-conformable arrays. This example is created using a subset of the training data from my original data, the same modeling approach, and I attempt to predict on the full test set from my original data.

Any help troubleshooting these error(s) would be much appreciated. I can of course share the full dataset if that would be helpful.

x = c(-0.420619854880168, -0.769823976992038, -0.709316986674812, 
      -1.25099225335503, -0.618892161183838, -0.555349783432928, 
      -0.914234689796377, -1.22903701739405, -0.4833834921797, 
      -0.320848947810941, 0.135931013665819, 2.39042286987258, 
      1.73643729459268, -0.506909477648839, 1.62136146009556, 
      2.15263600603266, 1.4014282748866, 2.03401367337059, 0.877646599658447, 
      1.02535151508941, -0.837245279816666, 0.58292669901717, 
      0.602153227358826, 1.83594483207367, 1.02820280062304, 
      -0.765221508789011, -0.74152886321564, -0.354830989878368, 
      -0.282803791828277, -0.407939851800533)

y = c(3.37808396795311, 4.31703013336414, 3.62201047152382, 
      3.47647337833432, 3.57383927065914, 4.0274754006413, 5.11993857962149, 
      4.10603649459834, 3.44626699808267, 6.50187496364316, 47.364073741465, 
      49.4539651723017, 18.5584474694755, 167.384017225471, 43.854077667435, 
      20.5948980572258, 60.0090328389651, 32.2859775573889, 23.028143147698, 
      27.5759143301009, 36.8938384302345, 132.866721315487, 187.170964086464, 
      14.9766986238594, 4.54648985258215, 25.5890448582519, 48.696962593379, 
      36.9890174750545, 53.4100395759561, 49.565020848753)

group = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", 
          "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", 
          "4", "4", "4", "4", "4")

df = data.frame(y, x, group)
df$group = factor(df$group)

mod = mboost::gamboost(y ~ bols(x) + brandom(group), data = df, family = GammaReg())

x_test = c(1.2562301, -0.4628746, -0.2848149, -0.9655805, -1.0166867, 
           1.8343589, -0.6302188, 1.1909887, -0.8064399, 0.3444268, 
           -0.4593891)

y_test = c(3.004605, 5.595847, 7.62922, 6.687553, 7.435949, 
           11.453977, 13.381522, 13.393321, 6.855579, 16.023104, 
           17.48234)

group_test = c("5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "1")

df_test = data.frame(y = y_test, x = x_test, group = group_test)
df_test$group = factor(df_test$group)

preds = predict.mboost(mod, newdata = df_test)

# Error in f(init, x[[i]]) : non-conformable arrays

# Error when running on full dataset:
# Error in X %*% rowSums(cf) : 
# Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90
sbrockhaus commented 2 years ago

Hi, you have to make sure that your factor, i.e. the group variable, in data_test has exactly the same levels as in data. you can get that by something like:

x_test = c(1.2562301, -0.4628746, -0.2848149, -0.9655805, -1.0166867, 1.8343589, -0.6302188, 1.1909887, -0.8064399, 0.3444268, -0.4593891)

y_test = c(3.004605, 5.595847, 7.62922, 6.687553, 7.435949, 11.453977, 13.381522, 13.393321, 6.855579, 16.023104, 17.48234)

group_test = c(rep("4",10), "1") ## only use levels that were part of the original data object

df_test = data.frame(y = y_test, x = x_test, group = group_test) df_test$group = factor(df_test$group, levels = 1:4) ## set levels explicitly!

preds = predict.mboost(mod, newdata = df_test)

Am Mo., 22. Nov. 2021 um 15:46 Uhr schrieb kmorndahl < @.***>:

I am building a gamboost model with the GammaReg() family. The model fits fine, but when it comes time to predict on a new dataset I am getting an error: Error in X %*% rowSums(cf) : Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90.

I attempted to create a smaller reproducible example of this, but could not replicate that exact error. Instead, this smaller dataset gives a different error: Error in f(init, x[[i]]) : non-conformable arrays. This example is created using a subset of the training data from my original data, the same modeling approach, and I attempt to predict on the full test set from my original data.

Any help troubleshooting these error(s) would be much appreciated. I can of course share the full dataset if that would be helpful.

x = c(-0.420619854880168, -0.769823976992038, -0.709316986674812, -1.25099225335503, -0.618892161183838, -0.555349783432928, -0.914234689796377, -1.22903701739405, -0.4833834921797, -0.320848947810941, 0.135931013665819, 2.39042286987258, 1.73643729459268, -0.506909477648839, 1.62136146009556, 2.15263600603266, 1.4014282748866, 2.03401367337059, 0.877646599658447, 1.02535151508941, -0.837245279816666, 0.58292669901717, 0.602153227358826, 1.83594483207367, 1.02820280062304, -0.765221508789011, -0.74152886321564, -0.354830989878368, -0.282803791828277, -0.407939851800533)

y = c(3.37808396795311, 4.31703013336414, 3.62201047152382, 3.47647337833432, 3.57383927065914, 4.0274754006413, 5.11993857962149, 4.10603649459834, 3.44626699808267, 6.50187496364316, 47.364073741465, 49.4539651723017, 18.5584474694755, 167.384017225471, 43.854077667435, 20.5948980572258, 60.0090328389651, 32.2859775573889, 23.028143147698, 27.5759143301009, 36.8938384302345, 132.866721315487, 187.170964086464, 14.9766986238594, 4.54648985258215, 25.5890448582519, 48.696962593379, 36.9890174750545, 53.4100395759561, 49.565020848753)

group = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4")

df = data.frame(y, x, group) df$group = factor(df$group)

mod = mboost::gamboost(y ~ bols(x) + brandom(group), data = df, family = GammaReg())

x_test = c(1.2562301, -0.4628746, -0.2848149, -0.9655805, -1.0166867, 1.8343589, -0.6302188, 1.1909887, -0.8064399, 0.3444268, -0.4593891)

y_test = c(3.004605, 5.595847, 7.62922, 6.687553, 7.435949, 11.453977, 13.381522, 13.393321, 6.855579, 16.023104, 17.48234)

group_test = c("5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "1")

df_test = data.frame(y = y_test, x = x_test, group = group_test) df_test$group = factor(df_test$group)

preds = predict.mboost(mod, newdata = df_test)

Error in f(init, x[[i]]) : non-conformable arrays

Error when running on full dataset:

Error in X %*% rowSums(cf) :

Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/boost-R/mboost/issues/115, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCWY7VMEP66AGWM5DMSNTTUNJJVBANCNFSM5IRF2UTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kmorndahl commented 2 years ago

Ah easy fix, thanks very much @sbrockhaus!

The error (Error in X %*% rowSums(cf) : Cholmod error 'X and/or Y have wrong dimensions) still persisted in the full data set, but this seems to be resolved in the patched version available through github.

Are there plans to/a timeline for incorporating the patched version into the main CRAN install? (pardon my ignorance here, I don't know much about package development).

Best

kmorndahl commented 2 years ago

Upon further reflection, I'm wondering if mboost provides the functionality to include new levels of a brandom() grouping variable in the test set/newdata object for prediction. For random effects, I believe this is not uncommon (?) For example, the glmer function provides the allow.new.levels = TRUE parameter for this purpose.

Thanks!

hofnerb commented 2 years ago

I am wondering that the code breaks with the CRAN version but not the github version as they should be practically identical and we did not modify anything wrt to the above mentioned functionality.

A feature such as allow.new.levels would be indeed nice to have but would require quite some time to implement. I do not know if anyone currently has this time. I don't think so. You are more than welcome to contribute code, though.