Closed MatthewMiddlehurst closed 11 months ago
Good summary, I will form lists of affected transformers. We could take the opportunity to change the name panel too ...
There is the constraint when using the SAX
algorithm as well. If the user chooses to use the algorithm with a 2D numpy array, he should add an axis himself in the between to make the shape (n_cases, n_channels, time_series_length). For deep learners i remember there was an automated way to do that internally. Plus the usage pandas series can be removed, i think most people will use numpy here. Especially that there is no option to return a numpy array.
this has been resolved by #709 any residual issues related to this should be raised as specific separate issues I think
The transformation's module does not currently interact with 2D arrays in an intuitive way for the classification/regression/clustering task. As an example, a 2D numpy array is currently treated as a multivariate single series (n_timepoints, n_series), but someone coming from
sklearn
who is familiar with their framework will assume it is multiple univariate series (n_cases, n_timepoints).If this mistake is made, there is a chance that there will be no indication of any problem, as the base class will convert it to a usable format (regardless of intention). For example, this can result in multiple
TSFreshRelevantFeatureExtractor
objects being fitted on many single series, which makes no sense at all. Even in cases where the output is not effected, i.e.ROCKET
, it still makes the transformation grossly inefficient.In my opinion, the growth and usability of the module is currently constrained by trying to force 2 distinct learning tasks into a single framework. It is not sensible to have the class infer what task the input is when the tasks share valid input datatypes but use them in different ways.
This still needs further discussion on actions to take (if any). In the last developer meeting, there was generally agreement that the current implicit conversion of 2D data is not the design we want. A few options:
panel
andseries
transformers (names can be changed) each with their own acceptable input types and task specific actions. While there would be extra effort required to use these transformers for the opposite task, it should be possible to implement converters between them so that they are still usable.