Elm PR #192 added tests of Pipeline and EaSearchCV for xarray and numpy data structures (see #202 for goals there). Some of the tests on xarray based data structures were failing when they related to cross validation. Cross validation iterators from sklearn depend on having typically a 2D X matrix that is sliced into training and test subsets.
Implement cross validation for xarray data structures by creating functions that split an iterable of arguments to a sampler, where those functions use KFold or other cross validation iterators from sklearn.model_selection.
In the example above SAMPLES could be a list of filenames or datetime/spatial arguments a function needs to make a sample xarray_filters.MLDataset where that list. Inside EaSearchCV or its reference to daskml, the SAMPLES iterable would be divided by KFold (default here because an integer 5 is given as cv) or by passing an iterator from sklearn.model_selection, e.g. cv=StratifiedFold(7).
Elm PR #192 added tests of
Pipeline
andEaSearchCV
for xarray and numpy data structures (see #202 for goals there). Some of the tests on xarray based data structures were failing when they related to cross validation. Cross validation iterators fromsklearn
depend on having typically a 2D X matrix that is sliced into training and test subsets.Implement cross validation for xarray data structures by creating functions that split an iterable of arguments to a sampler, where those functions use KFold or other cross validation iterators from
sklearn.model_selection
.An example usage is below (taken from wiki):
In the example above
SAMPLES
could be a list of filenames or datetime/spatial arguments a function needs to make a samplexarray_filters.MLDataset
where that list. InsideEaSearchCV
or its reference todaskml
, theSAMPLES
iterable would be divided by KFold (default here because an integer 5 is given ascv
) or by passing an iterator fromsklearn.model_selection
, e.g.cv=StratifiedFold(7)
.