cjekel / similarity_measures

Quantify the difference between two arbitrary curves in space
MIT License
245 stars 41 forks source link

What's the meaning of d in calculating DTW distance? #5

Open DanAlz opened 5 years ago

DanAlz commented 5 years ago

I have used DTW distance as the documentation shows. I want to calculate the distance between two curves that each one has three points and when I print (d) I get back a 3*3 matrix that I don't know what it shows? What does each element of the matrix show?

cjekel commented 5 years ago

d is the cumulative distance matrix between the two curves. I think the Matlab explanation is good. It's sometimes useful to visualize the distance matrix and optimal alignment path.

I think the [0, 0] value is the distance between the first point of each curve, the [n-1, n-1] value is the DTW distance, and all other values represent the distance from [0,0] to the [i, j] index.

Let me know if this helps.

DanAlz commented 5 years ago

d is the cumulative distance matrix between the two curves. I think the Matlab explanation is good. It's sometimes useful to visualize the distance matrix and optimal alignment path.

I think the [0, 0] value is the distance between the first point of each curve, the [n-1, n-1] value is the DTW distance, and all other values represent the distance from [0,0] to the [i, j] index.

Let me know if this helps.

At first, I thought the [0,0] value is the euclidean distance between the first point of each curve and the [0,1] value is the euclidean distance between the first point of the first curve and the second point of the second curve and generally [n,m] value is the euclidean distance between n_th point of the first curve and m_th point of the second curve but I calculated euclidean distance between points of the first curve and second curve and I saw the result is different from what d matrix represent.

cjekel commented 5 years ago

It's the cumulative distance matrix that results in the smallest d[n-1, n-1].

At first, I thought the [0,0] value is the euclidean distance between the first point of each curve

This is correct. If it's not -- something is wrong.

Let c be the pairwise distance between each point.

d[0, 1] is d[0, 0] + c[0, 1]

d[1, 1] is min(d[0,0], d[0,1], d[1,0]) + c[1,1]

etc...

Here is some code to help visualize what's going on.

import numpy as np
import matplotlib.pyplot as plt
import similaritymeasures as sm

x = np.array([[0, 0, 0, 0]]).T
y = np.array([[1, 2, 3, 4]]).T
r, d = sm.dtw(x, y)
assert(0 + 1 + 2 + 3 + 4 == r)

plt.imshow(d, origin='lower')
plt.yticks([0, 1, 2, 3])
plt.xticks([0, 1, 2, 3])
plt.title('Cumlative distance matrix')
plt.xlabel('Index of x')
plt.ylabel('Index of y')
plt.colorbar()
plt.show()

Check out algorithm 2.1 from http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf