Closed TonyBagnall closed 6 days ago
Hey! I can take on this issue.
@aeon-actions-bot assign @notaryanramani
Hey @TonyBagnall , I just have one question. Do I need to create a BaseImputer
class as well or just use sklearn's _BaseImputer
?
hi, yes fantastic. It should be a BaseCollectionTransformer structured like the resizer or rescaler transformers. Dont think we need our own BaseImputer, not sure about sklearn's one, is there a benefit from also inheriting from there? I'll take a look
@TonyBagnall
Can input be a list of 2d numpy array also or will it be just a 3d numpy array?
@TonyBagnall
Can input be a list of 2d numpy array also or will it be just a 3d numpy array?
sorry, I dont get notified, must set it up. Easy to have both,
_tags = {
"X_inner_type": ["np-list", "numpy3D"],
"fit_is_empty": False,
"capability:multivariate": True,
"capability:unequal_length": True,
"removes_missing_values": True
}
Describe the feature or idea you want to propose
It would be good to have imputers for time series. sklearn has https://scikit-learn.org/1.5/api/sklearn.impute.html
SimpleImputer: KNNImputer: IterativeImputer
But these are not appropriate for time series, since they work across features (or time points for series) sklearn imputers work across time points not series.
](https://scikit-learn.org/1.5/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer)
Describe your proposed solution
file in transformations/collections called _impute.py with separate classes for different types of imputation. I dont mind copying sklearn structure, we should definitely have a version of SimpleImputer:
https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.htm
SimpleImputer: replace missing with a constant, default series mean. sklearn has a strategy argument. Note we operate along series not columns:
strategy: str or Callable, default=’mean’ The imputation strategy. If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data. If “median”, then replace missing values using the median along each column. Can only be used with numeric data. If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data. If an instance of Callable, then replace missing values using the scalar statistic returned by running the callable over a dense 1d array containing non-missing values of each column.
Describe alternatives you've considered, if relevant
No response
Additional context
No response