ahwillia / affinewarp

An implementation of piecewise linear time warping for multi-dimensional time series alignment
MIT License
162 stars 36 forks source link

Does it support time series with different length? #14

Closed ccffccffcc closed 4 years ago

ccffccffcc commented 4 years ago

Hi Alex,

I tried to use this package on time series with different length. However, it seems that the package doesn't support this function. I would like to warp time series to the same length. Is it possible to realize this function using your package?

Besides, I have one more comment. After warping, when the time series are shifted and become shorter than the template, the algorithm will extend the shorter time series by simply reusing the first or last point. For neural signals, we would generally have additional information about the past or future spikes. It would be useful if this additional information is considered. I think there is a way to achieve this function. We can use the spike time arrays instead of the binned spike counts and we may be able to transform the spike time and then binned to calculate loss within a certain predefined time window.

Thank you!

Best, Feng

ahwillia commented 4 years ago

Good questions / comments. Here's some brief answers that will hopefully help

I tried to use this package on time series with different length. However, it seems that the package doesn't support this function. I would like to warp time series to the same length. Is it possible to realize this function using your package?

This isn't built-in now, but there's maybe a hack you could use to get around it. Specifically, you could linearly interpolate to upsample your shorter time series to have the same length as your longer time series, and then initialize the warping functions appropriately (the slope should be the inverse of the upsampling factor). See the manual_fit method in the PiecewiseWarping class for how to specify warping functions manually. Then you could call model.fit(...) as usual to optimize the warpings further.

Long-term we could add support for missing data in the code (see this blog post for some ideas). You could treat the shorter time series as having a string of missing time bins, and the long trials as fully observed. I don't have plans to implement this myself anytime soon --- if you wanted to take a stab at trying it, I could try to help you!

After warping, when the time series are shifted and become shorter than the template, the algorithm will extend the shorter time series by simply reusing the first or last point. For neural signals, we would generally have additional information about the past or future spikes. It would be useful if this additional information is considered. I think there is a way to achieve this function. We can use the spike time arrays instead of the binned spike counts and we may be able to transform the spike time and then binned to calculate loss within a certain predefined time window.

Yep, this is already supported for spike trains! If you call model.transform(...) on a SpikeData object it will extrapolate the warping function and effectively warp spikes into the time window of each trial. Basically when you create the spike data object you set tmin and tmax to be the start and end of each trial. But you can have spikes that happen before tmin (negative values are allowed) and after tmax.

I think the primate data example demonstrates this. Let me know if you want more explanation...

In the future it would be nice to extend this functionality to non-spike-train time series (e.g. for fMRI time series).

ccffccffcc commented 4 years ago

Hi Alex,

Thank you so much for your reply!

I guess what you suggested for the second point is to perform transformation on spike trains after optimization. However, what I meant actually is using spike trains for optimization. I am not sure if you actually mean that it is supported to use spike trains for optimization. Sorry for misunderstanding. In some instances when the neural data are very noisy. The algorithm will result in large deviation from the identity transformation. It only happens on a few samples so regularization would actually not work well. I guess the problem is the ill treatment of the boundaries. As for some trials, the algorithm may tend to reuse the start and end points. However, if additional information of the spike trains before and after the selected time window is provided, it may help prevent this ill behavior. Does it make sense?

I have anther technical question and hope you could help. In some cases, I would like to perform stretching only without any shift. I went through a little in your implementation and it seems there is not an easy way to achieve this function. Is there a way to fix the shift to 0 and only optimize over the slopes?

Thanks, Feng

ahwillia commented 4 years ago

I guess what you suggested for the second point is to perform transformation on spike trains after optimization. However, what I meant actually is using spike trains for optimization.

Ah, I see. We had a conscious design to not do it it this way because the optimization problem can become ill-posed without some careful regularization. You could warp all spikes to a single time bin on every trial and fit them very well with a single (but nonsensical) template. It also makes things like cross-validation tricky. So this is outside the scope of the current package.

In some cases, I would like to perform stretching only without any shift.

There is no good way to do this currently in the code, but it would not be too hard to pin x_knots[0] = y_knots[0] = 0 in the code. This is the relevant part of the code where the knots are mutated -- https://github.com/ahwillia/affinewarp/blob/master/affinewarp/_optimizers.py#L203 -- you'd want to set next_x[0] = 0 and next_y[0] = 0] here. Sorry this part of the code is quite dense and confusing.

ccffccffcc commented 4 years ago

Thank you so much for your reply and help!