aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
958 stars 110 forks source link

[BUG] _get_shp_importance ranking features for linear & trees #2016

Open IRKnyazev opened 2 weeks ago

IRKnyazev commented 2 weeks ago

Describe the bug

There are a few levels to this issue. The first and most straight forward is in the case of STC and linear classifiers, the line extracting the coefficients of the linear classifier weights is doing the inverse of what is intended. Given that a positive coef means that as a distance value increases it is more likely to be the latter class, features important to that latter class are those with a small distance and hence a negative coef. Line 695 should be changed to coefs = np.append(coefs, -coefs, axis=0).

Then the next challenge is that for RTSD only 1/3 of the features are distance metrics, so the above step wont be a simple fix here. For example @baraline mentioned that the number of occurrence can be good between 3-4 but bad after (and before), so the coefficient in linear models doesn't capture the real importance here.

As a long term goal it might worth to add a method (independent of the model used) to compute feature importance given a fitted model.

Steps/Code to reproduce the bug

In the case of the gunpoint problem _get_shp_importance(0)[0] is returning shapelets from the no gun class (encoded as 1) , it should be returning the gun shapelets (encoded as 0).

Expected results

NA

Actual results

NA

Versions

No response

baraline commented 2 weeks ago

Something coming up in my mind after review, did we handle the multiclass case in #2017 ? Following the same logic as the binary case, we would need to do coefs = -coefs as it is one vs-all in the mutliclass case if i'm not mistaking.

IRKnyazev commented 2 weeks ago

No you are absolutely right, I did not consider the multiclass scenario. So by negating the coefs it will make the distances be inversely correlated to the class in question - is that what you mean by one-vs all? @baraline

baraline commented 2 weeks ago

Think so ? In multi class the linear methods learn one set of coefficient for each class. (n_classes, n_channels) instead of n_channels in binary case