angus924 / minirocket

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
GNU General Public License v3.0
273 stars 32 forks source link

some question about multivarible version. #18

Closed Presburger closed 2 years ago

Presburger commented 2 years ago

hello, I watch the code about multivarible miniroket. I think the combine multi channels is not make sense for me. Conv(x) , x is channel 0 Conv(y), y is channel 1 when combine the channel, just become: Conv(x+y) why not, change the np.sum to np.prod. Conv(x*y)

angus924 commented 2 years ago

Hi @Presburger.

The way the multivariate version works (it could potentially be improved quite a lot) is to randomly assign a subset of channels to each kernel/dilation combination. As you say, we add the channels together. This is equivalent to applying the same kernel to each of the selected channels.

So let's say for kernel W_0 and dilation d=1, we assign channels [3, 14, 246]. For this kernel/dilation combination, we then sum channels [3, 14, 246], effectively producing a univariate (i.e., one-channel) time series. This is equivalent to applying the same kernel, W_0, to each of the three channels.

Does that help? Maybe I misunderstood what you were asking.

There may indeed be more effective ways to do it. It's really just intended to provide basic multivariate funcionality.

Presburger commented 2 years ago

thanks for your reply @angus924 sometime data in different channels may have different distribution. sometime normalize the data from different channels may break some useful infomaton. so, maybe apply different kernel to different channel, then combine their PPV, make me feel better. certainly,I can use minirocket single channel version to different channel, but there is a little regret, bias can not be unified. All in all, the minirocket is a genius tool,this tool help me a lot, Thank U.

angus924 commented 2 years ago

Hi @Presburger.

You are absolutely right, the data in each channel might be very different (different types of data, different scales / magnitudes, possibly even different lengths, etc.). You also right that normalising the data per channel could remove important information. On the other hand, normalisation might help, if one channel is really important but has a relatively small scale / magnitude in relation to other channels.

In the multivariate version of MiniRocket, we set the bias values based on the combined channels. So, for example, when we are generating bias values for a given kernel/dilation combination, and we assign, e.g., channels [57, 214, 551] to that combination, we will add those channels together, and sample the bias values based on the combined channels (i.e., bias = sample(X[:, 57] + [:, 214] + X[:, 551]). In this way, the bias values and PPV features reflect the combination of channels. However, whether it makes sense to normalise the data before combining channels in this way is really something you need to determine experimentally.

One alternative, as you say, is to apply different kernels to each channel. The current MiniRocket implementation is not really set up to do this, but it is very simple in principle. Rocket takes this approach (applying different kernels to each channel), and Hydra can be very easily modified to do something similar. Again, what works better will almost certainly depend on the particular dataset.