Open sylvaincom opened 1 year ago
hi sylvian, thanks for that. Yes I think we decided to not square root to the DTW, and if I remember correctly its because not doing so makes no difference: DTW is not a metric, and the distance is used for a relative comparison, so taking square root is redundant really. I seem to remember Eamonn Keogh citing it as a possible optimisation in his "trillions" paper, but would need to look it up (and can post the link if this reference is too obscure). Thanks for pointing this out though, I'll discuss with the other developers involved in distances and see if we might want to change
Hi Tony,
Thank you for your message.
In my experiments, the square root seemed to change the scores agglomerative clustering, for example:
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score
def get_cluster_labels(distance_matrix, K, y_true):
clustering_model = AgglomerativeClustering(
n_clusters=K,
linkage='ward',
connectivity=distance_matrix
)
clustering_model.fit(distance_matrix)
cluster_labels = clustering_model.labels_
rand_score = adjusted_rand_score(
labels_pred=cluster_labels, labels_true=y_true
)
return cluster_labels, rand_score
K = len(set(y))
cluster_labels_dtw_tslearn, rand_score_dtw_tslearn = get_cluster_labels(distance_matrix_dtw_tslearn, K, y)
cluster_labels_dtw_aeon, rand_score_dtw_aeon = get_cluster_labels(distance_matrix_dtw_aeon, K, y)
cluster_labels_dtw_aeon_sqrt, rand_score_dtw_aeon_sqrt = get_cluster_labels(distance_matrix_dtw_aeon_sqrt, K, y)
print(f"{rand_score_dtw_tslearn = :.5f}")
print(f"{rand_score_dtw_aeon = :.5f}")
print(f"{rand_score_dtw_aeon_sqrt = :.5f}")
which returns:
rand_score_dtw_tslearn = 0.01256
rand_score_dtw_aeon = -0.00640
rand_score_dtw_aeon_sqrt = 0.01256
hmmm, a negative rand score looks like a weird thing, @chrisholder lets look into this. Thanks sylvian
Hi @sylvaincom thanks for the bug report!
I've been looking into this and im not sure why the results are different. They should be identical since regardless of if you take the sqrt or not the warping path remains the same. Im going to look into this further but any ideas you may have would be greatly appreciated!
The reason we don't sqrt is just computational efficiency since as mentioned it doesn't impact the warping path therefore taking the sqrt would only really be done to convert the value back to the same scale of the data.
hi @sylvaincom
it is all a bit odd, I dont think its usual to sqrt DTW in the first place, I've not seen it done in other implementations, Romain might have some input @rtavenar?
But really, I'm not sure why it would make any difference at all. As an aside, you can get the pairwise distance matrix directly through aeon
from aeon.distances import dtw_pairwise_distance
could try that see if it makes a difference? Probably not this, since chris has seen diferent results using sqrt and kmeans, but worth removing one possible source of error
Hi here,
Regarding tslearn
, I decided to use the formulation with the square root as it seemed more natural to me. For example, when doing $k$-means, the objective on which we optimize is $\sumk \sum{i \in C_k} d(x_i, c_k)^2$ so if you plug DTW as your "distance" here, you expect that the DTW formulation includes the square root.
Yet, I agree with @TonyBagnall that for all algorithms based on nearest-neighbour searches, it should not change anything.
so looking at this again almost a year later, and as sylvain originally said, fundamentally the functions are the same. This is equivalent for example
import numpy as np
import math
from aeon.datasets import load_arrow_head
from aeon.distances import dtw_distance as dtw_aeon
from tslearn.metrics import dtw as dtw_tslearn
# Load some data
X, y = load_arrow_head(return_type="numpy2d")
print("AEON =",dtw_aeon(X[0], X[1]),"TSLEARN =", dtw_tslearn(X[0], X[1]))
l1 = []
l2 = []
for i in range(0,5):
for j in range(0,5):
l1.append(math.sqrt(dtw_aeon(X[i], X[j])))
l2.append(dtw_tslearn(X[i], X[j]))
print(l1)
print(l2)
err_msg = "The distance matrix from DTW differs between aeon and tslearn"
assert np.allclose(l1,l2), err_msg
I dont think this is a bug, but an open issue as to whether to take the square root or not. Not sure why it produces different results with AgglomerativeClustering
Describe the bug
Hi,
First of all, thanks for the great work at
aeon
.Your implementation of DTW (
aeon.distances.dtw_distance
) differs from the one bytslearn
(tslearn.metrics.dtw
) as it does not seem the apply the square root.Steps/Code to reproduce the bug
The following returns an
AssertionError
:The following returns
True
:Expected results
No error is thrown
Actual results
Versions