Closed johannes-kk closed 4 years ago
I don't fully understand the sampling function, but the updated train_test_split looks good to me.
The updated sample
function has two parts. If replace == true
it uses the original code whereby it repeatedly samples the vector of row indices uniformly until the bootstrapped sample has nrow
observations. If replace == false
it instead shuffles the vector of original row indices, and pulls the first nrow
from that, with at most the same number of observations as there are in the original dataframe.
train_test_split
then just uses sample
to bootstrap a sample with the same number of rows without replacement, meaning it effectively just shuffles the passed dataframe.