aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
1.02k stars 128 forks source link

[ENH] Implement SimpleImputer class #2293

Closed TonyBagnall closed 6 days ago

TonyBagnall commented 2 weeks ago

Describe the feature or idea you want to propose

It would be good to have imputers for time series. sklearn has https://scikit-learn.org/1.5/api/sklearn.impute.html

SimpleImputer: KNNImputer: IterativeImputer

But these are not appropriate for time series, since they work across features (or time points for series) sklearn imputers work across time points not series.

](https://scikit-learn.org/1.5/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer)

Describe your proposed solution

file in transformations/collections called _impute.py with separate classes for different types of imputation. I dont mind copying sklearn structure, we should definitely have a version of SimpleImputer:

https://scikit-learn.org/1.5/modules/generated/sklearn.impute.SimpleImputer.htm

SimpleImputer: replace missing with a constant, default series mean. sklearn has a strategy argument. Note we operate along series not columns:

strategy: str or Callable, default=’mean’ The imputation strategy. If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data. If “median”, then replace missing values using the median along each column. Can only be used with numeric data. If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data. If an instance of Callable, then replace missing values using the scalar statistic returned by running the callable over a dense 1d array containing non-missing values of each column.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

notaryanramani commented 2 weeks ago

Hey! I can take on this issue.

notaryanramani commented 2 weeks ago

@aeon-actions-bot assign @notaryanramani

notaryanramani commented 2 weeks ago

Hey @TonyBagnall , I just have one question. Do I need to create a BaseImputer class as well or just use sklearn's _BaseImputer ?

TonyBagnall commented 2 weeks ago

hi, yes fantastic. It should be a BaseCollectionTransformer structured like the resizer or rescaler transformers. Dont think we need our own BaseImputer, not sure about sklearn's one, is there a benefit from also inheriting from there? I'll take a look

notaryanramani commented 1 week ago

@TonyBagnall

Can input be a list of 2d numpy array also or will it be just a 3d numpy array?

TonyBagnall commented 1 week ago

@TonyBagnall

Can input be a list of 2d numpy array also or will it be just a 3d numpy array?

sorry, I dont get notified, must set it up. Easy to have both,

    _tags = {
        "X_inner_type": ["np-list", "numpy3D"],
        "fit_is_empty": False,
        "capability:multivariate": True,
        "capability:unequal_length": True,
        "removes_missing_values": True
}