anthonydevaux / DynForest

Random forest with multivariate longitudinal predictors
Other
15 stars 2 forks source link

cannot take a sample larger than the population when #7

Closed bernard-liew closed 1 year ago

bernard-liew commented 1 year ago

Hi,

Thank you for creating very this interesting package. I have a dataset with fixed and repeated measures variables for prognostic modelling. The dataset is n >2000. I created a synthetic data with similar stats properties as the original using the synthpop package. The original dataset resulted in an error 3 nodes produced errors; first error: cannot take a sample larger than the population when 'replace = FALSE', which is reproduced using a subset of the synthetic data below.

I hope that you can enlighten me if I am doing anything wrong?

Also, as an aside, can dynforest naturally include missing values, or must it be imputed prior.

` library (tidyverse) library (DynForest)

sample_dat = tibble::tribble( ~id, ~abds_ms_0m, ~ext_ms_0m, ~odi_12m, ~time, ~lbp, ~legp, ~fabq, ~ases, 1L, 17, 21, 40, 0, 4, 5, 16, 7.30000019073486, 1L, 17, 21, 40, 3, 3, 3, 12, 7.84000015258789, 1L, 17, 21, 40, 6, 8, 7, 16, 8.19999980926514, 1L, 17, 21, 40, 12, 7, 3, 5, 6.40000009536743, 2L, 120, 180, 8, 0, 0, 0, 0, 8.73999977111816, 2L, 120, 180, 8, 3, 0, 0, 4, 9.81999969482422, 2L, 120, 180, 8, 6, 0, 0, 4, 9.60000038146973, 2L, 120, 180, 8, 12, 1, 0, 12, 9.10000038146973, 3L, 111, 180, 14, 0, 7, 0, 1, 4.40000009536743, 3L, 111, 180, 14, 3, 7, 0, 3, 4.23999977111816, 3L, 111, 180, 14, 6, 9, 0, 6, 2.79999995231628, 3L, 111, 180, 14, 12, 1, 1, 1, 4.96000003814697, 4L, 18, 21, 8.88888931274414, 0, 2, 0, 0, 9, 4L, 18, 21, 8.88888931274414, 3, 10, 8, 0, 8.92000007629395, 4L, 18, 21, 8.88888931274414, 6, 7, 6, 1, 8.38000011444092, 4L, 18, 21, 8.88888931274414, 12, 2, 1, 0, 7.11999988555908, 5L, 26, 138, 16, 0, 5, 0, 0, 10, 5L, 26, 138, 16, 3, 0, 0, 0, 9.27999973297119, 5L, 26, 138, 16, 6, 1, 0, 0, 10, 5L, 26, 138, 16, 12, 2, 0, 0, 10, 6L, 50, 30, 18, 0, 6, 0, 14, 4.96000003814697, 6L, 50, 30, 18, 3, 6, 7, 10, 5.5, 6L, 50, 30, 18, 6, 3, 6, 2, 8.02000045776367, 6L, 50, 30, 18, 12, 5, 6, 8, 6.94000005722046, 7L, 24, 90, 10, 0, 7, 4, 10, 6.57999992370605, 7L, 24, 90, 10, 3, 9, 9, 3, 6, 7L, 24, 90, 10, 6, 6, 10, 14, 4.23999977111816, 7L, 24, 90, 10, 12, 9, 4, 5, 8.5600004196167, 8L, 47, 64, 11.1111106872559, 0, 7, 7, 8, 3.16000008583069, 8L, 47, 64, 11.1111106872559, 3, 3, 3, 5, 5.5, 8L, 47, 64, 11.1111106872559, 6, 0, 3, 1, 8.38000011444092, 8L, 47, 64, 11.1111106872559, 12, 3, 1, 1, 6.94000005722046, 9L, 91, 99, 8, 0, 5, 1, 0, 8.5600004196167, 9L, 91, 99, 8, 3, 4, 1, 0, 8.80000019073486, 9L, 91, 99, 8, 6, 3, 0, 0, 8.73999977111816, 9L, 91, 99, 8, 12, 0, 0, 1, 9.46000003814697, 10L, 60, 40, 4.44444465637207, 0, 7, 5, 12, 7.84000015258789, 10L, 60, 40, 4.44444465637207, 3, 2, 0, 8, 8.38000011444092, 10L, 60, 40, 4.44444465637207, 6, 5, 4, 4, 7.30000019073486, 10L, 60, 40, 4.44444465637207, 12, 3, 2, 5, 8.875, 11L, 120, 73, 12.5, 0, 8, 5, 8, 6.21999979019165, 11L, 120, 73, 12.5, 3, 7, 6, 13, 2.79999995231628, 11L, 120, 73, 12.5, 6, 5, 2, 5, 2.44000005722046, 11L, 120, 73, 12.5, 12, 3, 1, 6, 4.96000003814697, 12L, 36, 36, 8, 0, 6, 1, 14, 6.03999996185303, 12L, 36, 36, 8, 3, 4, 1, 0, 5.5, 12L, 36, 36, 8, 6, 5, 2, 9, 6.03999996185303, 12L, 36, 36, 8, 12, 2, 1, 13, 6.21999979019165, 13L, 30, 37, 35.5555572509766, 0, 9, 9, 14, 3.88000011444092, 13L, 30, 37, 35.5555572509766, 3, 4, 7, 10, 5.32000017166138, 13L, 30, 37, 35.5555572509766, 6, 9, 9, 12, 5.5, 13L, 30, 37, 35.5555572509766, 12, 7, 7, 15, 5.5, 14L, 10, 15, 18, 0, 2, 0, 14, 10, 14L, 10, 15, 18, 3, 10, 0, 19, 3.79999995231628, 14L, 10, 15, 18, 6, 2, 0, 20, 6.76000022888184, 14L, 10, 15, 18, 12, 2, 0, 12, 10, 15L, 17, 8, 40, 0, 7, 8, 7, 8.02000045776367, 15L, 17, 8, 40, 3, 3, 6, 2, 5.67999982833862, 15L, 17, 8, 40, 6, 8, 7, 5, 5.5, 15L, 17, 8, 40, 12, 8, 9, 0, 4.78000020980835, 16L, 15, 80, 6, 0, 7, 8, 10, 6.19999980926514, 16L, 15, 80, 6, 3, 1, 1, 2, 8.5600004196167, 16L, 15, 80, 6, 6, 2, 4, 6, 6.76000022888184, 16L, 15, 80, 6, 12, 3, 2, 3, 8.02000045776367, 17L, 18, 57, 31.1111106872559, 0, 7, 8, 19, 3.88000011444092, 17L, 18, 57, 31.1111106872559, 3, 4, 6, 10, 7.11999988555908, 17L, 18, 57, 31.1111106872559, 6, 6, 8, 19, 5.32000017166138, 17L, 18, 57, 31.1111106872559, 12, 2, 3, 11, 4.59999990463257, 18L, 43, 180, 30, 0, 7, 0, 4, 8.19999980926514, 18L, 43, 180, 30, 3, 5, 2, 9, 8.19999980926514, 18L, 43, 180, 30, 6, 4, 5, 7, 7.48000001907349, 18L, 43, 180, 30, 12, 7, 7, 4, 7.84000015258789, 19L, 19, 16, 32, 0, 6, 6, 12, 1, 19L, 19, 16, 32, 3, 0, 2, 10, 3.16000008583069, 19L, 19, 16, 32, 6, 3, 1, 15, 4.23999977111816, 19L, 19, 16, 32, 12, 9, 7, 15, 1.72000002861023, 20L, 60, 45, 22, 0, 7, 5, 11, 5.5, 20L, 60, 45, 22, 3, 7, 8, 17, 4.78000020980835, 20L, 60, 45, 22, 6, 0, 5, 24, 7.97499990463257, 20L, 60, 45, 22, 12, 6, 6, 20, 5.67999982833862, 21L, 120, 180, 8.88888931274414, 0, 4, 0, 12, 8.02000045776367, 21L, 120, 180, 8.88888931274414, 3, 2, 0, 3, 10, 21L, 120, 180, 8.88888931274414, 6, 1, 0, 12, 10, 21L, 120, 180, 8.88888931274414, 12, 2, 0, 5, 10, 22L, 105, 180, 16, 0, 4, 5, 12, 6, 22L, 105, 180, 16, 3, 2, 1, 0, 6.57999992370605, 22L, 105, 180, 16, 6, 6, 5, 0, 8.38000011444092, 22L, 105, 180, 16, 12, 4, 1, 0, 7.84000015258789, 23L, 120, 180, 0, 0, 2, 5, 4, 9.80000019073486, 23L, 120, 180, 0, 3, 1, 8, 0, 9.39999961853027, 23L, 120, 180, 0, 6, 3, 3, 0, 9.27999973297119, 23L, 120, 180, 0, 12, 1, 6, 0, 9.81999969482422, 24L, 46, 107, 6.66666650772095, 0, 6, 1, 12, 7.40000009536743, 24L, 46, 107, 6.66666650772095, 3, 4, 0, 11, 7.65999984741211, 24L, 46, 107, 6.66666650772095, 6, 5, 2, 9, 7.84000015258789, 24L, 46, 107, 6.66666650772095, 12, 2, 0, 4, 7.84000015258789, 25L, 51, 48, 6, 0, 9, 7, 15, 3.88000011444092, 25L, 51, 48, 6, 3, 8, 5, 7, 7.11999988555908, 25L, 51, 48, 6, 6, 8, 7, 7, 6.57999992370605, 25L, 51, 48, 6, 12, 8, 10, 7, 8.42500019073486 )

timeData <- sample_dat %>% select ( matches ("id|time|fabq|ases|lbp|legp")) %>% as.data.frame()

Create object with longitudinal association for each predictor

timeVarModel <- list(lbp = list(fixed = lbp ~ time, random = ~ time), legp = list(fixed = legp ~ time, random = ~ time), fabq = list(fixed = fabq ~ time, random = ~ time), ases = list(fixed = ases ~ time, random = ~ time))

Build fixed data

fix_vars <- c("id", grep ("0m", names (sample_dat), value = TRUE)) fixedData <- unique(sample_dat[, fix_vars])%>% as.data.frame()

Build outcome data

Y <- list(type = "numeric", Y = unique(sample_dat[,c("id","odi_12m")])%>% as.data.frame())

Run DynForest function

res_dyn <- DynForest(timeData = timeData, fixedData = fixedData, timeVar = "time", idVar = "id", timeVarModel = timeVarModel, mtry = 10, Y = Y, cause = 2, ncores = 3, seed = 1234)

summary(res_dyn)

`

Regards. Bernard

anthonydevaux commented 1 year ago

Dear Bernard,

Thank you for using DynForest!

This error occured because you have chosen a mtry parameter outside of its possible values (number of predictors). In your example, mtry was set to 10 but you have only 6 predictors (4 time-dependent and 2 time-fixed). I will implement another check to avoid this issue.

Regarding missing values, DynForest do not support them, except for time-dependent predictors. In this case, missing values are allowed while at least one observation is available per subject to ensure random-effects computation. As you suggested, you can indeed use imputation method before using DynForest.

Let me know if you have any further questions.

Best regards, Anthony

bernard-liew commented 1 year ago

Thank you Anthony for the quick response. All working now. Continue the great work...