Open jreps opened 1 month ago
Here is code that seems to work for me:
createExplicitSplitSetting <- function( testRowIds, trainRowIds, trainFolds ){
splitSettings <- list(testRowIds = testRowIds, trainRowIds = trainRowIds, trainFolds = trainFolds )
attr(splitSettings, "fun") <- "explicitSplitter" class(splitSettings) <- "splitSettings" return(splitSettings) }
explicitSplitter <- function( population, splitSettings ) { testRowIds = splitSettings$testRowIds trainRowIds = splitSettings$trainRowIds trainFolds = splitSettings$trainFolds
split <- data.frame( rowId = c(testRowIds,trainRowIds), index = c(rep(-1, length(testRowIds)), trainFolds) )
return(split) }
Does the proposed code also allow for controlling the training folds ? Like if you need to ensure exactly the same split not only into train/test but as well each fold in train.
I added a suggested feature for this in #504
At the moment we can split the data into train/test and folds by patientId, rowId or time.
It would be nice to have an explicit splitter where you can provide the rowIds for the test/train/folds. That way you can ensure the same split even with different features etc.