Closed TonyBagnall closed 10 months ago
@TonyBagnall hmmm, strange. I'll have a look!
its only this dataset, ran on the other 111 fine, so some weird edge case. Looks like an empty resample or perhaps a single class resample, not sure how it works internally tbh, there is something parallel going on with SFA :) Reproduced the above on windows and linux though
@TonyBagnall
Found the problem! Happy to say its not my code 😛 (I think...)
Line 117 of the code for SAX calls:
X = scipy.stats.zscore(X, axis=-1)
Every single value is identical in the 931th sample of the test split for UWaveZ, -0.99841144, which makes scipy.stats.zscore(X) return an array of NaNs which then breaks things during SAX. This is presumably due to dividing by the standard deviation which is zero.
This can be minimally verified like so:
import scipy
import numpy as np
from aeon.datasets import load_classification
testX, _ = load_classification("UWaveGestureLibraryZ", split="TEST")
testX_931 = testX[930]
print(f"All values the same: {np.all(testX_931 == -0.99841144)}")
z_scores = scipy.stats.zscore(testX[930], axis=-1)
print(f"All values NaN: {np.all(np.isnan(z_scores))}")
Other z-norm implementations, e.g. sklearn.preprocessing.scale(), avoid this problem by returning an all zero array in this case, so something like that is possibly the most obvious solution. scale() isn't a drop in replacement sadly, as it wouldn't be able to handle multivariate input (can only handle 2 axes so won't play nice with numpy 3D arrays).
While looking into this I have also found a small issue in the REDCOMETS implementation which is that SAX is z-normed while SFA is not (both should be) so I'll make a quick fix for that and submit a PR.
good find, we had a similar issue with shapelets, I think we have our own z-norm that protects against this and another weird numab feature @baraline? I'll look later
I think it was a numba function. In any case, red comets runs now on uwave. thanks for the fix, can close this now
Describe the bug
REDCOMET classifier crashes on the one UWave dataset, but not on others. Some form of sampling issue maybe? @zy18811 any ideas?
Steps/Code to reproduce the bug
Expected results
prints accuracy
Actual results
Versions
No response