Open yijunwang0805 opened 3 years ago
Here are some illustrations of using cv_apply.R
, first as a fancy apply function, second to do leave-one-out, and third to do time series CV. You can imagine the function passed to cv_apply
fitting a model to the training data, or fitting a model on the training data and evaluating it on the test data; here I only have an example of extracting parts of the training and test sets.
library(magrittr)
data.array = tidyr::crossing(a=1:3,b=4:6,c=7:9) %>%
dplyr::mutate(abc = paste0(a,b,c)) %>%
reshape2::acast(a ~ b ~ c, value.var="abc")
names(dimnames(data.array)) <- c("a","b","c")
print(data.array)
## A simple example that doesn't look like CV:
cv_apply(data.array, list(each=NULL,each=NULL,all=NULL), function(train, test) {
print("TRAIN")
print("dim:")
print(dim(train)) # 1 1 3
print("dimnames:")
print(dimnames(train)) # 1 1 3
print("object:")
print(train)
print("RESHAPED")
reshaped.train = train
dim(reshaped.train) <- dim(train)[3]
dimnames(reshaped.train) <- dimnames(train)[3]
print("dim:")
print(dim(reshaped.train))
print("dimnames:")
print(dimnames(reshaped.train))
print("object:")
print(reshaped.train)
print(identical(train, test)) # train and test are sliced identically when using only `each` and `all`
stop ('STOPPING AFTER THE FIRST "FOLD"')
})
## Leave-one-value-of-`c`-out-CV:
cv_apply(data.array, list(all=NULL,all=NULL,loo=NULL), function(train, test) {
print("TRAIN")
print(train) # has c=8 and c=9 data in the first fold
print("TEST")
print(test) # has c=7 data in the first fold
stop ('STOPPING AFTER THE FIRST "FOLD"')
})
## "Time series CV" treating `c` as the time dimension, starting with the second value of `c` so that there is at least one value of `c` in the training set (so that the training set won't be empty):
results = cv_apply(data.array, list(all=NULL,all=NULL,oneahead=2), function(train, test) {
print("TRAIN")
print(dim(train)) # varies
print(dimnames(train))
print("TEST")
print(dim(test)) # 3 3 1 for both folds
print(dimnames(test))
result = list(trainfirst=train[[1]], testfirst=test[[1]])
return (result)
})
dim(results) # 2 1 1 2
dimnames(results)
## [[1]]
## [1] "trainfirst" "testfirst"
## $a
## [1] "all"
## $b
## [1] "all"
## $c
## [1] "8" "9"
names(dimnames(results)) # "" "a" "b" "c"
results[["trainfirst","all","all","8"]] # the value for `trainfirst` using all values of `a`, all values of `b`, and training data for values of `c` before "8"
results
Hi,
It is me again!
Could you please give an example for how to use cv_apply function?
Thank you!