Open TomAugspurger opened 6 years ago
We should handle this the same way as scikit-learn: fit params that are the same length as X are split just like X.
For dask dataframes, we could maybe rely on the heuristic of splitting the fit param when the number of divisions match X.
This also applies to incremental hyperparameter optimization in _fit
Things like
classes=da.unique(y)
may be inefficient. This will have to be called on each block of data, which is expensive especially if they
isn't persisted.Things like
sample_weight
are tricky. It's an array ofn_samples
that should actually be chunked along withX
andy
. We don't do this correctly right now.raises with
We don't want to persist that, as it may be too large.