Fix implementation of multiple unique fold columns for repeated cross-validation

LudvigOlsen / groupdata2

R-package: Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.

Other

27 stars 3 forks source link

Fixed both of these.

Then found that you can use unique(as.matrix(data), MARGIN=2) to do a similar thing and test against current approach:

code:

`set.seed(1) df <- data.frame("participant" = factor(rep(c('1','2', '3', '4', '5', '6'), 3)), "age" = rep(c(25,65,34), 3), "diagnosis" = rep(c('a', 'b', 'a', 'a', 'b', 'b'), 3), "score" = c(34,23,54,23,56,76,43,56,76,42,54,1,5,76,34,76,23,65))

df <- df %>% dplyr::arrange(participant, score)

system.time({
df_folded_100reps <- fold(df, 3, num_col = 'score', num_fold_cols=100,max_iters = 100) }) ` Current approach: user system elapsed 16.939 0.266 17.310

Using unique: user system elapsed 247.794 4.186 253.402

So sticking to my own approach. One reason for the difference may be, that I only compare two columns once, while unique can compare two columns up to 100 times in the example.

LudvigOlsen / groupdata2

Fix implementation of multiple unique fold columns for repeated cross-validation #6