TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis
https://stumpy.readthedocs.io/en/latest/
Other
3.65k stars 318 forks source link

Update documentation regarding Euclidian distance instead of Pearson's correlation #822

Closed lucaspg96 closed 1 year ago

lucaspg96 commented 1 year ago

According to the docs, the stumpy.stump method:

Compute the z-normalized matrix profile

    This is a convenience wrapper around the Numba JIT-compiled parallelized
    `_stump` function which computes the matrix profile according to STOMPopt with
    Pearson correlations.

However, computing the matrix profile over a time series (from scipy.misc import electrocardiogram) is returning only positive values and they are bigger than 1. I boxplotted them:

image

seanlaw commented 1 year ago

@lucaspg96 Thank you for your question. I can understand the confusion. However, the emphasis of "Pearson correlation" is relating to the "STOMPopt" algorithm, which actually first computes the Pearson correlation before ultimately converting this to a z-normalized Euclidean distance. As opposed to the "STOMP" algorithm, which computes the z-normalized Euclidean distance directly without computing the Pearson correlation first.

Fundamentally, a "matrix profile" contains some sort of distance but it depends on the value of the normalize and/or p parameters (i.e., it isn't always z-normalized Euclidean distance, though, that is the default).

seanlaw commented 1 year ago

Perhaps it would be helpful to say:

Compute the matrix profile (default z-normalized Euclidean distance)

seanlaw commented 1 year ago

@lucaspg96 Are you able to provide some feedback?