baggepinnen / DynamicAxisWarping.jl

Dynamic Time Warping (DTW) and related algorithms in Julia, at Julia speeds
Other
98 stars 11 forks source link

using weighted distances seemingly not compatible with dtw_cost #58

Closed wbarhdad closed 11 months ago

wbarhdad commented 11 months ago

Using a weighted metric (e.g. WeightedSqEuclidean) for the dynamic time warping seems to not be possible as the weights passed to the distance are a vector, but in lines 190 and 199 in dtw.jl the cost is calculated using distances between scalar values (but in a for loop over the sequence). This results in a dimension mismatch error when trying to calculate the distance.

baggepinnen commented 11 months ago

The source lines you point to are probably not correct? https://github.com/baggepinnen/DynamicAxisWarping.jl/blob/master/src/dtw.jl#L190C5-L199C1

Could you provide an example that demonstrates the problem?

baggepinnen commented 11 months ago

It is probably possible to compute the weighted Euclidean distance by scaling the time series before calling dtw. This would also be algorithmically better, since this operation has $O(n)$ complexity whereas perfoming the weighting inside the loop would have $O(n^2)$ complexity.

wbarhdad commented 11 months ago

Sorry, you are right, I meant on lines 208 and 217 in the source file, where dist() is called.

A minimal example that doesn't work could be: ''' fs = 70 t = range(0,stop=1,step=1/fs) y0 = sin.(2pi . t) y1 = sin.(3pi . t) y = [y0;y1[2:end]] .+ 0.01 . randn.() q = [y0;y0[2:end]] .+ 0.01 . randn.() w = rand(length(y))

d = DTW(dist=WeightedSqEuclidean(w), radius=15) d(y,q) '''

This results for me in a dimension mismatch error.

Could you provide a small example explaining the scaling? Does this mean I "apply" the weights to both time series first and then compute the Euclidean distance on it?

baggepinnen commented 11 months ago

Could you provide a small example explaining the scaling? Does this mean I "apply" the weights to both time series first and then compute the Euclidean distance on it?

Exactly :)

Since the weighted distances is defined like this

WeightedSqEuclidean | wsqeuclidean(x, y, w) | sum((x - y).^2 .* w)

you have to multiply the time series by w.^2 to get the same result using Euclidean distance

julia> d = DynamicAxisWarping.DTW(dist=SqEuclidean(), radius=15)

julia> d(y.*w.^2,q.*w.^2)
3.636986538424772

This works because $(wx - wy)^2 = w^2(x-y)^2$

baggepinnen commented 11 months ago

By the way, are you sure weighting in time like you are attempting to do actually makes sense in the presence of time warping? If you are comparing q(1) with y(5), what weight are you expecting to use? If q and y are identical, you expect to use zero warping. However, if weights are not identical, you would still get warping such that the weights are minimized.

wbarhdad commented 11 months ago

Thanks for the example!

Without going into too much detail, I was playing around with an idea of a more informative distance measure for parameter inference of a simulation model.

In short, I have a simulation output y_star (time series) and observation y (also time series) for which I use DTW to compare both. I also have for each parameter a set of weights w that represent the influence this parameter has over time (also time series with same points of observation as y and y_star). My idea was to have a distance measure that takes into account the influential time points for this parameter. However, what you said here:

By the way, are you sure weighting in time like you are attempting to do actually makes sense in the presence of time warping? If you are comparing q(1) with y(5), what weight are you expecting to use? If q and y are identical, you expect to use zero warping. However, if weights are not identical, you would still get warping such that the weights are minimized.

made me realize that indeed my idea might not make sense in the presence of warping as observation points may get switched around.

baggepinnen commented 11 months ago

You could potentially incorporate the transportcost argument, which adds a penalty to perform any warping. You would end up with some form of a tuning problem between the different cost terms though, so I'm not sure if this is what you'd like. I'll close the issue now since I don't think this is something we want to support here, and there is a clear workaround if you'd like. The weighted distance is really ment to weight multivariate time series, rather than different points in time.

wbarhdad commented 11 months ago

Thanks for all the help!