Custom estimators / sklearn / dask and data structure flexibility checklist

This issue is a checklist related to the degree of support in elm and related tools for for different data structures, sklearn and custom estimators, and parallelism support. This is a an epic for the "Data Structure Flexibility" milestone of Phase II and is related to machine learning flexibility (at least in how we create PR's), but let's try to put most of the planning details in specific issues and keep this one as a long term documentation reminder.

Data Structure Flexibility

Data structures to (ideally) support for most scikit-learn models and custom estimators (in approximate order of priority relative to most milestones' needs):

xarray_filters.MLDataset - From xarray_filters + Elm PR #192 refactor + ...
xarray.Dataset - Converted to an xarray_filter.MLDataset where needed)
xarray.DataArray - When calling MLDataset.to_features()
dask.array - Elm PR #192 began using dask_searchcv base classes to elm with support for dask data structures
dask.dataframe - My thought is that dask.array and dask.dataframe should be essentially interchangeable in elm (not sure if that is the current status of dask_searchcv and related stacks)
numpy.array - This is the type supported by scikit-learn - just included here as a reminder that elm's multi-model machine learning tools, e.g. EaSearchCV need to support numpy except where there are specific methods in elm/xarray_filters/etc that require context metadata or spatial coordinates.
pandas.DataFrame - I'm not sure of the level of pandas support in scikit-learn as I have seen most people work with numpy then assemble inputs/outputs as needed into pandas where needed for pre/postprocessing. For example, sklearn-pandas is a library I haven't tried out personally. Can we address pandas by just converting to dask.dataframe so we are at least dealing with dataframe support in one place?

Caveats:

Not all of the data structures above make sense for every transformer / estimator, e.g.
- sklearn.isotonic.IsotonicRegression takes a 1D X matrix
- The sklearn.cross_decomposition module has several estimators that take 2D X and 2D Y.

Estimator Flexibility

Support estimators/transformers:

scikit-learn
Custom estimators that follow the scikit-learn fit/predict/transform pattern

As we start issues / PRs in elm/ xarray_filters / etc regarding data structure flexibility, let's relate them back to this issue so we can better track exactly which estimators/transformers are having compatibility problems with each data structure.

Parallelism

What are the capabilities and limitations of the parallelism approach for each estimator/transformer and data structure combination? This needs to be better explained in documentation (now and ongoing). For example, with most of elm's current parallelism mainly favors the break-up-the-sample-data-into-separate-embarrassingly-parallel-fitting-jobs approach rather than the single-large-feature matrix approach, but gradually we are also building single-large-feature matrix methods (e.g. the work in dask-glm for large dask data structures - see also daskml).

cc @gbrener @hsparra

ContinuumIO / elm