aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
970 stars 112 forks source link

[ENH] Transformers that need conversion from nested pd.DataFrame to numpy arrays #197

Closed TonyBagnall closed 1 year ago

TonyBagnall commented 1 year ago

nested_univ is no longer going to be supported in aeon, switching fully to numpy. Some transformers still store things internally in this format. Nearly all of them just iterate and convert to numpy, so can just be converted to "X_inner_mtype": "numpy3D", and then adjusted when we facilitate unequal length. I'll list here as I find them (would search tags but doesnt work, see #182 )

Leaving these Tabularizer reduce.py its embedded in the reduction stuff in forecasting, so I dont want to mess with it ColumnTransformer can sort it out when we address #171

basically need to switch from "X_inner_mtype": "nested_univ", # which mtypes do _fit/_predict support for X? to this "X_inner_mtype": "numpy3D",
and obviously make it all work and update any tests. Also a good time to add examples. Also need to do this in the constructor to stop nonsense super(PAA, self).init(_output_convert=False)

then treat X as numpy in fit and transform. Some much easier than others, any that work with unequal length (e.g. truncate and pad) leave for now because need to work with lists of numpy

aiwalter commented 1 year ago

Actually it seems nested pandas is still working with pandas 2.0.0?

TonyBagnall commented 1 year ago

still want to get rid of them :)

lmmentel commented 1 year ago

Thanks for the summary @TonyBagnall! I think it would be a good move to drop the nested, but could we discuss actual consequences and alternatives?

TonyBagnall commented 1 year ago

sure @lmmentel happy to discuss, shall we have a developers meeting on the issue?

TonyBagnall commented 1 year ago

but to clarify, this is all internal, it will not effect anything external. The base class will just do the conversions. We are not dropping nested_univ support, just storing things internally a numpy arrays to avoid unnecessary conversions with the standard use cases

TonyBagnall commented 1 year ago

Its done, enjoying closing this one :) I'll raise a separate issue for Tabularizer reduce.py and update #171 for ColumnTransformer