asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
252 stars 29 forks source link

Lack of reproducibility in tsclust #28

Closed dcoulomb closed 6 years ago

dcoulomb commented 6 years ago

Hi, I am using the tsclust function to cluster my 150 timeseries. Timeseries length vary from 40 to 140.

I can't understand why the function does not always returns the same results. If I run it twice with the exact same parameters (distance="DTW", type="partitional" and k from 2 to 10), I don't obtain the same results. Can you help me understand the theory behind this issue ? Maybe a link to a paper could help me. Is it the curse of dimensionality ?

Thanks a lot !

asardaes commented 6 years ago

Partitional clustering is random, if you don't specify any random seed, you will get a different result each time. Specify a seed, and your results should not change for the same input parameters.

dcoulomb commented 6 years ago

Thanks for your input !