[ENH] Implementation of DTW: should it have the square root?

sylvaincom commented 1 year ago

Describe the bug

Hi,

First of all, thanks for the great work at aeon.

Your implementation of DTW (aeon.distances.dtw_distance) differs from the one by tslearn (tslearn.metrics.dtw) as it does not seem the apply the square root.

Steps/Code to reproduce the bug

The following returns an AssertionError:

# Imports
import numpy as np
from aeon.datasets import load_arrow_head
from aeon.distances import dtw_distance as dtw_aeon
from tslearn.metrics import dtw as dtw_tslearn
from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score

# Load some data
X, y = load_arrow_head(return_type="numpy2d")

# Compute the distance matrix for different implementations
distance_matrix_dtw_aeon = squareform(pdist(X, dtw_aeon))
distance_matrix_dtw_tslearn = squareform(pdist(X, dtw_tslearn))

# Check if the distance matrix are the same
err_msg = "The distance matrix from DTW differs between aeon and tslearn"
assert np.allclose(distance_matrix_dtw_aeon, distance_matrix_dtw_tslearn), err_msg

The following returns True:

distance_matrix_dtw_aeon_sqrt = np.sqrt(distance_matrix_dtw_aeon)
np.allclose(distance_matrix_dtw_aeon_sqrt, distance_matrix_dtw_tslearn)

Expected results

No error is thrown

Actual results

AssertionError: The distance matrix from DTW differs between aeon and tslearn

Versions

System: python: 3.10.6 (main, Oct 24 2022, 11:04:34) [Clang 12.0.0 ] executable: [/Users/my_username/opt/anaconda3/envs/my_env/bin/python](https://file+.vscode-resource.vscode-cdn.net/Users/my_username/opt/anaconda3/envs//bin/python) machine: macOS-10.16-x86_64-i386-64bit Python dependencies: pip: 22.1.2 setuptools: 59.8.0 sklearn: 0.0 aeon: 0.4.0 statsmodels: 0.14.0 numpy: 1.23.5 scipy: 1.9.3 pandas: 2.0.3 matplotlib: 3.5.3 joblib: 1.2.0 numba: 0.56.2 pmdarima: None tsfresh: None

TonyBagnall commented 1 year ago

hi sylvian, thanks for that. Yes I think we decided to not square root to the DTW, and if I remember correctly its because not doing so makes no difference: DTW is not a metric, and the distance is used for a relative comparison, so taking square root is redundant really. I seem to remember Eamonn Keogh citing it as a possible optimisation in his "trillions" paper, but would need to look it up (and can post the link if this reference is too obscure). Thanks for pointing this out though, I'll discuss with the other developers involved in distances and see if we might want to change

sylvaincom commented 1 year ago

Hi Tony,

Thank you for your message.

In my experiments, the square root seemed to change the scores agglomerative clustering, for example:

from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score

def get_cluster_labels(distance_matrix, K, y_true):
    clustering_model = AgglomerativeClustering(
        n_clusters=K,
        linkage='ward',
        connectivity=distance_matrix
    )
    clustering_model.fit(distance_matrix)
    cluster_labels = clustering_model.labels_
    rand_score = adjusted_rand_score(
        labels_pred=cluster_labels, labels_true=y_true
    )
    return cluster_labels, rand_score

K = len(set(y))
cluster_labels_dtw_tslearn, rand_score_dtw_tslearn = get_cluster_labels(distance_matrix_dtw_tslearn, K, y)
cluster_labels_dtw_aeon, rand_score_dtw_aeon = get_cluster_labels(distance_matrix_dtw_aeon, K, y)
cluster_labels_dtw_aeon_sqrt, rand_score_dtw_aeon_sqrt = get_cluster_labels(distance_matrix_dtw_aeon_sqrt, K, y)

print(f"{rand_score_dtw_tslearn = :.5f}")
print(f"{rand_score_dtw_aeon = :.5f}")
print(f"{rand_score_dtw_aeon_sqrt = :.5f}")

which returns:

rand_score_dtw_tslearn = 0.01256
rand_score_dtw_aeon = -0.00640
rand_score_dtw_aeon_sqrt = 0.01256

TonyBagnall commented 1 year ago

hmmm, a negative rand score looks like a weird thing, @chrisholder lets look into this. Thanks sylvian

chrisholder commented 1 year ago

Hi @sylvaincom thanks for the bug report!

I've been looking into this and im not sure why the results are different. They should be identical since regardless of if you take the sqrt or not the warping path remains the same. Im going to look into this further but any ideas you may have would be greatly appreciated!

The reason we don't sqrt is just computational efficiency since as mentioned it doesn't impact the warping path therefore taking the sqrt would only really be done to convert the value back to the same scale of the data.

TonyBagnall commented 1 year ago

hi @sylvaincom

it is all a bit odd, I dont think its usual to sqrt DTW in the first place, I've not seen it done in other implementations, Romain might have some input @rtavenar?

But really, I'm not sure why it would make any difference at all. As an aside, you can get the pairwise distance matrix directly through aeon

from aeon.distances import dtw_pairwise_distance

could try that see if it makes a difference? Probably not this, since chris has seen diferent results using sqrt and kmeans, but worth removing one possible source of error

rtavenar commented 1 year ago

Hi here,

Regarding tslearn, I decided to use the formulation with the square root as it seemed more natural to me. For example, when doing $k$-means, the objective on which we optimize is $\sumk \sum{i \in C_k} d(x_i, c_k)^2$ so if you plug DTW as your "distance" here, you expect that the DTW formulation includes the square root.

Yet, I agree with @TonyBagnall that for all algorithms based on nearest-neighbour searches, it should not change anything.

TonyBagnall commented 3 months ago

so looking at this again almost a year later, and as sylvain originally said, fundamentally the functions are the same. This is equivalent for example

import numpy as np
import math
from aeon.datasets import load_arrow_head
from aeon.distances import dtw_distance as dtw_aeon
from tslearn.metrics import dtw as dtw_tslearn
# Load some data
X, y = load_arrow_head(return_type="numpy2d")
print("AEON =",dtw_aeon(X[0], X[1]),"TSLEARN =", dtw_tslearn(X[0], X[1]))
l1 = []
l2 = []
for i in range(0,5):
    for j in range(0,5):
        l1.append(math.sqrt(dtw_aeon(X[i], X[j])))
        l2.append(dtw_tslearn(X[i], X[j]))

print(l1)
print(l2)
err_msg = "The distance matrix from DTW differs between aeon and tslearn"
assert np.allclose(l1,l2), err_msg

I dont think this is a bug, but an open issue as to whether to take the square root or not. Not sure why it produces different results with AgglomerativeClustering

aeon-toolkit / aeon