angus924 / minirocket

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
GNU General Public License v3.0
273 stars 32 forks source link

minirocket_multivariate extremely slow #8

Closed turmeric-blend closed 3 years ago

turmeric-blend commented 3 years ago

My setup is that I am using large dataset (10,000+) and I pass data as batches into model. I do not cache the data and run transform every time I pass data into model on every epoch. I run this same setup for both

minirocket.py with input shape (32768,99) and

minirocket_multivariate.py with input shape (32768,1,99) so the number of channel is 1.

I find that the minirocket_multivariate.py version runs significantly more slow on every transform() relative to minirocket.py.

Is there a potential bug in the code?

angus924 commented 3 years ago

Hi @turmeric-blend, good question. Yes, the multivariate implementation is quite a lot slower than the univariate implementation. (This is partly why, for now at least, they are separate, I have also separated MiniRocket and MiniRocketMultivariate in sktime.)

I am aware of the problem. Unfortunately, I haven't had the chance yet to work out whether there is a straightforward way to 'fix' it. There may be a way to avoid whatever is causing the problem by rearranging the oprerations a bit, or it might be that there is some issue with how numba handles the additional dimension.

Basically, if you have univariate data, just use the univaraite implementation. If you have multivariate data, unfortunately at the moment the speed issue is unavoidable.

There is a GPU implementation coming (this is not my work), which will most likely be a lot faster for most multivariate datasets, at least until I can work out how to fix this issue with the CPU implementation.

turmeric-blend commented 3 years ago

ok, thanks for letting me know.